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OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 



April 29, 2004, 14:53:14 ; Search time 5104.38 Seconds 

(without alignments ) 
9184.983 Million cell updates/sec 

US-09-989-981A-9_COPY_343 6_5005 
1570 

1 cgaagcatcctgaagtacag ctagagagcaaacccagagc 1570 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 



27513289 seqs, 14931090276 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 50 summaries 



55026578 



Database 



EST:* 
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em_estba : * 
em_esthum: * 
em_es tin : * 
em_estmu: * 
em__estov: * 
em_estpl : * 
em_estro : * 
em_htc: * 
gb_estl : * 
gb_est2 : * 
gb_htc: * 
gb_est3 : * 
gb_est4 : * 
gb_est5 : * 
em_estf un : * 
em_estom: * 
em_gss_hum: * 
em_gss_inv: * 
em_gss_pln : * 
em_gss_vrt : * 
em_gss_fun: * 
em_gss__mam: * 
em_gss_mus : * 
em_gss_pro : * 
em_gss_rod: * 
em_gss_phg : * 
em gss vrl : * 



28: gb_gssl:* 
29: gb_gss2:* 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 

BB605863/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BB605863 306 bp mRNA linear EST 05-DEC-2000 

BB605863 RIKEN full-length enriched, 0 day neonate lung Mus 
musculus cDNA clone E030013I04 5 ■ , mRNA sequence. 
BB605863 

BB605863.1 GI:11557265 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 306) 

Aizawa,K., Akahira,S., Akimura,T., Arai,A., Arakawa,T., 
Carninci,P., Hanagaki,T., Hayatsu,N., Hiraoka,T., Hirozane,T., 
Hodoyama,Y., Imotani, K. , Ishii,Y., Itoh,M., Izawa,M., Kawai,J., 
Kojima,Y., Konno,H., Kusakabe,M., Matsuyama, T . , Miyazaki,A., 
Nakamura,M., Nishi,K., Nomura, K., Numazaki,R., Okazaki,Y., 
Okido,T. f Owa,C, Sakai,C, Sakai,K., Sasaki, D. , Sato,K., 
Shibata,K., Shibata,Y., Shinagawa, A. , Shiraki,T., Sogabe,Y., 
Suzuki, H . , Tagawa,A., Takahashi , F. , Tanaka,T., Toya,T., 
Watahiki,A., Yamamura,T., Yasunishi,A. , Yoshida,K., Yoshiki,A., 
Muramatsu,M. and Hayashizaki, Y. 
RIKEN Mouse ESTs (Aizawa,K. et al. 2000) 
Unpublished (2000) 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res @gsc . riken . go . jp, 

URL : http : / / genome . gsc . riken . go . jp/ 

Carninci,P., Nishiyama, Y . , Westover,A., Itoh,M., Nagaoka,S., 
Sasaki, N., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 
Thermostabilization and thermoactivation of thermolabile enzymes by 
trehalose and its application for the synthesis of full length 
cDNA. Proc. Natl. Acad. Sci. U.S.A. 95 (2), 520-524 (1998) 

Itoh,M., Kitsunai,T., Akiyama,J., Shibata,K., Izawa,M., Kawai,J., 
Tomaru,Y., Carninci,P., Shibata,Y., Ozawa,Y., Muramatsu, M. , 
Okazaki,Y. and Hayashizaki, Y . 

Automated filtration-based high-throughput plasmid preparation 
system. Genome Res. 9 (5), 463-470 (1999) 

Carninci,P. and Hayashizaki, Y. 

High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 
19-44 (1999) 



Please visit our web site (http://genome.rtc.riken.go.jp) for 
further details. 
FEATURES Location/Qualifiers 
source 1. .306 

/organism="Mus musculus" 

/mol_type="mRNA" 

/db_xref="taxon: 10090" 

/clone="E030013I04" 

/tissue_type= ,l lung ,, 

/dev_stage="0 day neonate" 

/lab_host="DH10B" 

/clone_lib="RIKEN full-length enriched, 0 day neonate 
lung" 

/note="Site_l: Sail; Site_2: BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 ! 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3'], cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5* 

GAGAGAGAGATT CT C GAGT TAAT T AAAT TAAT CCCCCCCCCCCCC 3 ' ] . cDNA 
was cleaved with BamHI and Xhol . Vector: a modified 
pBluescript KS(+) after bulk excision from Lambda FLC I." 

ORIGIN 

Query Match 13.1%; Score 206.4; DB 10; Length 306; 

Best Local Similarity 94.1%; Pred. No. 3.5e-50; 

Matches 224; Conservative 1; Mismatches 12; Indels 1; Gaps 1; 

Qy 1 CGAAGCATCCTGT^AGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 237 CGAAGCGTCCTGAAGTACATTCCCTTTCCACAGCTGGGTCTTTTCTTTGGTTTTCTCACC 178 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 177 CATGACCAGTGCCGTTTGTCCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCCGTC 118 

Qy 121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 180 

I I I i I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I 
Db 117 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCACCCTTTCTCCCAGCATT 58 

Qy 181 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAA 238 

I I I : I I I I I I I I I I I I III II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 57 CCTTTCTGGCAAACAC-TCCCATTAACACACCGTGTGTTCTGCCTATTGTCGAGATTA 1 



RESULT 2 
BB869579 
LOCUS 

DEFINITION 
ACCESSION 



BB869579 339 bp mRNA linear EST 27-NOV-2001 

BB869579 RIKEN full-length enriched, adult male intestinal mucosa 
Mus musculus cDNA clone G630014E22 5', mRNA sequence. 
BB869579 



VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



BB8 6957 9. 1 GI: 171157 8 9 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 339) 

Akimura,T., Arakawa,T., Carninci, P., Furuno,M., Hanagaki,T,, 
Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., Imotani,K., 
Ishii,Y., Ito,M., Kawai,J., Kojima,Y., Konno, H. , Kouda,M., 
Matsuyama, T . , Nakamura,M., Nishi,K., Nomura, K., Numasaki,R., 
Okazaki,Y., Okido,T., Saito,R., Sakai,C, Sakai,K., Sakazume,N., 
Sasaki, D., Sato,K., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Suzuki, H., Tagawa,A. , Takahashi , F . , Takaku-Akahira, S . , 
Tanaka,T., Tomaru,A. , Toya,T., Watahiki,A. , Yasunishi,A. , 
Muramatsu,M. and Hayashizaki , Y. 

RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura,T., et al. 

2001) 

Unpublished (2001) 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome- res @gs c . riken . go . jp, 
URL : http : / / genome . gsc . riken . go . jp/ 

Carninci, P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fuj iwake, S . , Inoue,K., Togawa,Y., Izawa,M. , Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura,S., Kawai,J., Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system--384-f ormat 
sequencing pipeline with 384 multicapillary sequencer. Genome Res , 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi , Y . , Shibata,K., Itoh,M., Carninci, P., 
Sugahara,Y. and Hayashizaki, Y. 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

e mouse tissues. 

Location/ Qualifiers 
1. .339 

/organism="Mus musculus" 

/mol_type= n mRNA M 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="G630014E22" 

/sex="male n 

/tissue type=" intestinal mucosa" 



/dev_stage=" adult" 

/clone_lib="RIKEN full-length enriched, adult male 
intestinal mucosa" 

ORIGIN 

Query Match 10.6%; Score 166.4; DB 10; Length 339; 

Best Local Similarity 96.6%; Pred. No. 3.7e-38; 

Matches 170; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

Qy 4 03 GCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAG 462 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I M I I 
Db 2 GCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAG 61 

Qy 4 63 GGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAG 522 

I I I I I I I I I I I I I I I I II I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 62 GGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAG 121 

Qy 523 AGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 

II I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I II I I I II II I I I I 

Db 122 AGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 177 



RESULT 3 

BB870338/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



BB870338 303 bp mRNA linear EST 27-NOV-2001 

BB870338 RIKEN full-length enriched, adult male intestinal mucosa 
Mus musculus cDNA clone G630020H06 5 f , mRNA sequence. 
BB870338 

BB870338.1 GI:17116548 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 303) 

Akimura,T., Arakawa,T., Carninci,P., Furuno,M., Hanagaki,T., 
Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T,, Imotani,K., 
Ishii,Y., Ito,M., Kawai,J., Kojima,Y., Konno,H. , Kouda,M., 
Matsuyama,T. , Nakamura,M., Nishi,K., Nomura, K., Numasaki,R., 
Okazaki,Y., Okido,T., Saito,R., Sakai,C, Sakai,K., Sakazume,N., 
Sasaki, D., Sato,K., Shibata,K., Shinagawa,A. , Shiraki,T., 
Sogabe,Y., Suzuki, H., Tagawa,A. , Takahashi, F. , Takaku-Akahira, S . , 
Tanaka,T., Tomaru,A., Toya,T., Watahiki,A., Yasunishi,A. , 
Muramatsu,M. and Hayashizaki, Y. 

RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura,T., et al . 
2001) 

Unpublished (2001) 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome- res @gsc. riken. go . jp, 

URL : http : / / genome . gs c . riken . go . jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 



FEATURES 

source 



Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura,S., Kawai,J., Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi, Y. , Shibata,K., Itoh,M., Carninci,P., 
Sugahara,Y. and Hayashizaki, Y. 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

e mouse tissues . 

Location/Qualif iers 
1. .303 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone= ,, G630020H06 ,, 

/sex="male" 

/ tissue_type— "intestinal mucosa" 
/dev_stage="adult" 

/clone_lib="RIKEN full-length enriched, adult male 
intestinal mucosa" 



ORIGIN 



Query Match 10.5%; 
Best Local Similarity 96.6%; 
Matches 168; Conservative 



Score 164.4; DB 10; Length 303; 
Pred. No. 1.4e-37; 
0; Mismatches 6; Indels 0; 



Gaps 



0; 



Qy 



Db 



1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 
I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I 
175 CGAACCATCCTGAAATACATTCCCATTCCACAGCTGGTTCTCTTCTTTGGTTTTCTCACC 116 



QY 
Db 



61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I II 
115 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTTTTGGGCTCTCTCTGTC 56 



Qy 



Db 



121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCC 174 
I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
55 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCC 2 



RESULT 4 
BY742680 

LOCUS BY742680 658 bp mRNA linear EST 17-DEC-2002 

DEFINITION BY742680 RIKEN full-length enriched, adult male liver tumor Mus 

musculus cDNA clone C730040P06 5 1 , mRNA sequence. 

ACCESSION BY742680 

VERSION BY742680.1 GI:27168376 



KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 658) 

Okazaki,Y., Furuno,M w Kasukawa,T., Adachi,J., Bono,H., Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa,Y. f Nogami,A. , 
Schonbach, C. , Gojobori,T., Baldarelli , R. , Hill, D. P., Bult,C, 
Hume, D. A., Quackenbush, J. , Schriml , L . M. , Kanapin,A., Matsuda,H., 
Batalov, S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., 
Chothia,C, Corbani, L. E . , Cousins, S., Dalla,E., Dragani , T . A. , 
Fletcher, C. F. , Forrest, A. , Frazer,K.S. , Gaasterland, T . , 
Gariboldi,M. , Gissi,C, Godzik,A., Gough,J., Grimmond,S., 
Gustincich, S . , Hirokawa,N., Jackson, I . J. , Jarvis,E.D., Kanai,A. , 
Kawaji,H., Kawasawa,Y., Kedzierski , R. M. , King,B.L., Konagaya,A. , 
Kurochkin, I . V. , Lee,Y., Lenhard,B., Lyons, P. A., Maglott, D. R. , 
Maltais,L., Marchionni , L . , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky,N. , Pillai,R., Pontius , J. U . , Qi, D. , Ramachandran, S . , 
Ravasi,T., Reed, J. C, Reed, D. J., Reid,J., Ring,B.Z., Ringwald,M. , 
Sandelin,A., Schneider , C . , Semple,C.A., Setou,M., Shimada,K., 
Sultana, R., Takenaka,Y., Taylor,M.S., Teasdale, R. D. , Tomita,M., 
Verardo,R., Wagner, L., Wahlestedt, C . , Wang,Y., Watanabe,Y., 
Wells, C, Wilming, L. G. , Wynshaw-Boris , A. , Yanagisawa,M. , Yang, I., 
Yang,L., Yuan,Z., Zavolan,M., Zhu,Y., Zimmer,A. , Carninci,P., 
Hayatsu,N,, Hirozane-Kishikawa, T . , Konno,H., Nakamura,M., 
Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K,, 
Arakawa,T., Fukuda,S., Hara,A., Hashizume,W. , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,I., Miyazaki,A. , Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa, A. , Yasunishi, A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J,, Birney,E. and Hayashizaki, Y. 

Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

Nature 420, 563-573 (2002) 

22354683 

12466851 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res @gsc . riken . go . jp, 

URL : http : // genome . gsc . riken . go . jp/ 

Adachi,J,, Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., 
Fukuda,S,, Hashizume, W. , Hayashida, K. , Hirozane,T., Hori,F., 
Imotani,K., Ishii,Y., Itoh,M., Kagawa,I., Kawai,J., Kojima,Y., 
Kondo,S., Konno,H., Koya,S., Miyazaki,A., Murata,M., Nakamura,M., 
Nomura, K., Numazaki,R., Ohno,M. , Ohsato,N., Saito,R., Sakazume,N., 
Sano,H., Sasaki, D., Sato,K., Shibata,K., Shiraki,T., Tagami,M., 
Takeda,Y., Waki,K., Watahiki,A., Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 



Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Tissue was provided by William A. Held, Roswell Park Cancer 
Institute, Department of Molecular and Cellular Biology, Elm and 
Carlton Streets, Buffalo, NY 14263, whose assistance we gratefully 
acknowledge . 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 
FEATURES Location/Qualifiers 
source 1. .658 

/organism= M Mus musculus" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/clone="C730040P06" 
/sex="male" 

/tissue_type="liver tumor" 
/ dev_stage-"adult" 
/lab_host="DH10B" 

/clone_lib="RIKEN full-length enriched, adult male liver 
tumor" 

/note="Site_l : Sail; Site_2 : BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5* 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 * ] , cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5 ? 

GAGAGAGAGATTCTCGAGTTAATTAAATTAATCCCCCCCCCCCCC 3 1 ] . cDNA 
was cleaved with BamHI and Xhol . Vector: a modified 
pBluescript KS(+) after bulk excision from Lambda FLC I. 
Tissue was provided by William A. Held, Roswell Park 
Cancer Institute, Department of Molecular and Cellular 
Biology, Elm and Carlton Streets, Buffalo, NY 14263, whose 
assistance we gratefully acknowledge." 

ORIGIN 



Query Match 10.5%; Score 164.4; DB 13; Length 658; 

Best Local Similarity 96.6%; Pred. No. 2.2e-37; 

Matches 168; Conservative 0; Mismatches 6; Indels 0; Gaps 



0; 



Qy 4 05 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 4 64 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 60 

Qy 4 65 CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCT^VGGTTCGGTCACGGGCACAGAG 524 

I I I I I I I I I II I I I I 1 I 1 I I I I I II I I I I I I I I I II I I I I I I I I I I I I II I I I I II I I I I 
Db 61 CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAG 12 0 

Qy 525 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I ! 
Db 121 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 174 



RESULT 5 
BB598373 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BB598373 713 bp mRNA linear EST 26-OCT-2001 

BB598373 RIKEN full-length enriched, adult male liver tumor Mus 
musculus cDNA clone C730003G04 5", mRNA sequence. 
BB598373 

BB598373.2 GI : 164 50340 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 713) 

Arakawa,T., Carninci,P., Fukuda,S., Furuno,M., Hanagaki,T., 
Hara,A. , Hiramoto,K., Hori,F., Ishii,Y., Ito,M., Kawai,J., 
Konno,H., Kouda,M. f Koya,S., Matsuyama, T . , Miyazaki,A. , Nomura, K. , 
Ohno,M., Okazaki,Y., Okido,T., Saito,R., Sakai, C, Sakai,K., 
Sano,H. , Sasaki, D., Shibata,K., Shinagawa,A. , Shiraki,T., 
Sogabe,Y., Suzuki, H-, Tagami,M., Tagawa,A. , Takahashi, F. , 
Takeda,Y., Tanaka,T., Toya,T., Muramatsu,M. and Hayashizaki, Y. 
RIKEN Mouse ESTs (Arakawa,T., et al. 2001) 
Unpublished (2001) 

On Dec 1, 2000 this sequence version replaced gi: 11506974. 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc . riken . go . jp, 

URL : http : / / genome . gsc . riken . go . jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y . 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura,S., Kawai,J., Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. . 
10 (11), 1757-1771 (2000) 



Konno,H., Fukunishi, Y. , Shibata,K., Itoh,M., Carninci,P., 
Sugahara,Y. and Hayashizaki, Y. 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Kondo,S., Shinagawa, A. , Saito,T., Kiyosawa,H., Yamanaka,I., 
Aizawa,K., Fukuda,S., Hara,A. , Itoh,M., Kawai,J., Shibata,K. and 
Hayashizaki, Y. 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details . 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 
FEATURES Location/Qualifiers 
source 1. .713 

/organism="Mus musculus" 

/mol_type="mRNA" 

/db_xref="taxon: 10090" 

/clone="C730003G04" 

/sex="male" 

/tissue_type="liver tumor 11 

/dev_stage="adult" 

/lab_host="DH10B" 

/clone_lib= n RIKEN full-length enriched, adult male liver 
tumor" 

/note="Site_l: Sail; Site_2 : BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 f 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 1 ], cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5 1 

GAGAGAGAGATTCTCGAGTTAATTAAATTAATCCCCCCCCCCCCC 3 1 ] . cDNA 
was cleaved with BamHI and Xhol . Vector: a modified 
pBluescript KS(+) after bulk excision from Lambda FLC I. 
Tissue was provided by William A. Held, Roswell Park 
Cancer Institute, Department of Molecular and Cellular 
Biology, Elm and Carlton Streets, Buffalo, NY 14263, whose 
assistance we gratefully acknowledge." 

ORIGIN 



Query Match 10.5%; Score 164.4; DB 10; Length 713; 

Best Local Similarity 96.6%; Pred. No. 2.4e-37; 

Matches 168; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 



Qy 405 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 4 64 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I 
Db 1 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 60 



Qy 465 C C T CAC AT CAAC AGAGGGT C T C T GAG C T C C C T GGAGCAAGGT T C GGT C AC G GGCACAGAG 52 4 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 61 CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAG 120 

Qy 525 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 121 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 174 



RESULT 6 

AZ051299/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



AZ051299 663 bp DNA linear GSS 28-MAR-2001 

sito0006 Human Homo sapiens genomic clone CITB-978SK-B 569J16 T7 
end, genomic survey sequence. 
AZ051299 

AZ051299.1 GI:13470256 
GSS. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 663) 

Lu,K., Lee,M. and Patel,S.B. 

High-Resolution Physical and Transcript Map of Human Chromosome 
2p21 Containing the Sitosterolemia Locus 
Unpublished (2000) 
Contact: Patel SB 

Division of Endocrinology, Diabetes and Medical Genetics 
Medical University of South Carolina 

Strom Thurmond Bldg., Room 541, 114 Doughty Street, Charleston, SC 

29403, USA 

Tel: 843 876 5227 

Fax: 843 876 5133 

Email: patelsb@musc.edu 

Seq primer: T7 

Class: BAC ends. 

Location/Qualifiers 

1. .663 

/organism= f, Homo sapiens" 
/ mo l_type=" genomic DNA" 
/db_xref="taxon: 9606" 
/clone="CITB-978SK-B 569J16" 
/clone lib="Human" 



ORIGIN 



Query Match 9.1%; Score 143.4; DB 28; Length 663; 

Best Local Similarity 80.4%; Pred. No. 4.5e-31; 

Matches 168; Conservative 0; Mismatches 41; Indels 0; 



Gaps 



0; 



Qy 1082 GGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGC 1141 

II II I I I I I I I I I I I I I II I I I I I I I 1 I I I I I I I I I I I I I I I I 
Db 211 GGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCCCCAAGCTGCCTTTGC 152 



Qy 1142 CCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCC 1201 

II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I M I 
Db 151 CTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCC 92 



Qy 12 02 AGCAGAAGT GGGACAGGCAAAT CCT CAAAGAT GT CT C CTT GTACAT CGAGAGT GGC CAGA 1261 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I! I I I I I I I I I II I I I I 
Db 91 GG C AG C AGT G GAC CAGG CAGAT CCT CAAAGAT GT C T C CT T GT AC GT G GAGAG C GGGC AGA 32 

Qy 1262 TTATGTGCATCTTAGGCAGCTCAGGTAAG 12 90 

I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 31 TCATGTGCATCCTAGGAAGCTCAGGTAAG 3 



RESULT 7 
AA239884 

LOCUS AA239884 460 bp mRNA linear EST 03-MAR-1997 

DEFINITION mx81d01.rl Scares mouse NML Mus musculus cDNA clone IMAGE: 692737 5 1 

similar to WP:F19B6.4 CE05669 WHITE PROTEIN LIKE ;, mRNA sequence. 
ACCESSION AA239884 
VERSION AA239884.1 GI: 1863923 

KEYWORDS EST. 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 460) 

AUTHORS Marra,M., Hillier,L., Allen, M., Bowles, M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M. , Le,M. , Martin, J., Morris, M. , 
Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B,, 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 
TITLE The WashU-HHMI Mouse EST Project 

JOURNAL Unpublished (1996) 
COMMENT Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 
Washington University School of MedicineP 
4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 
Tel: 314 286 1800 
Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI:426297 

Possible reversed clone: similarity on wrong strand 
Seq primer: -28ml3 rev2 ET from Amersham 
High quality sequence stop: 413. 
FEATURES Location/Qualif iers 

source 1. .460 

/organism= n Mus musculus" 

/mol_type="mRNA" 

/db_xref="taxon: 10090" 

/clone="IMAGE: 692737" 

/tissue_type= "Liver" 

/lab_host="DH10B" 

/clone_lib="Soares mouse NML" 

/note="Vector : pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l: Not I; Site_2: Eco RI ; 1st strand cDNA 
was primed with a Not I - oligo(dT) primer [5 1 
TGTTACCAATCTGAAGTGGGAGCGGCCGCGAATCTTTTTTTTTTTTTTTTTTT 3 ■ ] ; 
double-stranded cDNA was ligated to Eco RI adaptors 



(Pharmacia) , digested with Not I and cloned into the Not I 
and Eco RI sites of the modified pT7T3 vector. Library 
constructed and normalized by Bento Soares and M.Fatima 
Bonaldo. " 



ORIGIN 



Query Match 8.7%; 
Best Local Similarity 97.9%; 
Matches 139; Conservative 



Score 137.2; DB 9; 
Pred. No. 2.6e-29; 
0; Mismatches 3; 



Length 4 60; 
Indels 0; 



Gaps 



0; 



Qy 



Db 



427 GT GAG CTGCCCTTTCT GAGT C C AGAGG GAG C CAGAGGGC C T C AC AT CAAC AGAGGGT CT C 

I I I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I M I 
1 GTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTC 



486 



60 



Qy 

Db 

Qy 

Db 



487 



61 



547 



121 



TGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCC 54 6 
I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TGAGCTCCCTGGACATAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCC 120 



568 



TGCATGTGTCCTACAGCGTCAG 
I I I I I I I II I I I I I I I I I I I I I 
TGCATGTGTCCTACAGCGTCAG 142 



RESULT 8 
AA237916 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



AA237916 



502 bp 



rnRNA 



linear 



EST 03-MAR-1997 



mxl4e08.rl Soares mouse NML Mus musculus cDNA clone IMAGE: 680198 5' 



mRNA sequence. 



FEATURES 

source 



similar to SW : BROW_DROME P12428 BROWN PROTEIN. 
AA237916 

AA2 37 916.1 GI : 18 61938 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 502) 

Marra,M., Hillier,L., Allen, M., Bowles, M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le,M., Martin, J., Morris, M., 
Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares, B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 419902 

Seq primer: -28ml3 rev2 ET from Amersham 
High quality sequence stop: 459. 

Location/ Qualifiers 

1. .502 



/organism="Mus mus cuius" 
/mol_type= n mRNA" 
/db_xref="taxon: 10090" 
/ clone= " IMAGE : 6 8 0 1 9 8 " 
/tissue_type= "Liver" 
/lab_host= n DH10B" 
/clone_lib="Soares mouse NML" 

/note="Vector : pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l: Not I; Site_2 : Eco RI ; 1st strand cDNA 
was primed with a Not I - oligo(dT) primer [5 1 
TGTTACCAATCTGAAGTGGGAGCGGCCGCGAATCTTTTTTTTTTTTTTTTTTT 3 ' ] ; 
double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Not I and cloned into the Not I 
and Eco RI sites of the modified pT7T3 vector. Library 
constructed and normalized by Bento Soares and M.Fatima 
Bonaldo . " 

ORIGIN 

Query Match 8.5%; Score 133.4; DB 9; Length 502; 

Best Local Similarity 95.8%; Pred. No. 3.8e-28; 

Matches 137; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

Qy 436 CCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCC 495 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2 CCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCC 61 

Qy 4 96 TGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGT 555 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I 
Db 62 TGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGT 121 

Qy 556 CCTACAGCGTCAGGTAAGGGGAC 578 

I I I I I I I I II I I 1 I III 
Db 122 CCTACAGCGTCACGAACCGTGTC 144 



RESULT 9 

BY705076/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



BY705076 583 bp mRNA linear EST 16-DEC-2002 

BY705076 RIKEN full-length enriched, adult male liver Mus musculus 
cDNA clone 1300003C16 5 1 , mRNA sequence. 
BY705076 

BY705076.1 GI:27116215 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
1 (bases 1 to 583) 

Okazaki,Y., Furuno,M. , Kasukawa, T . , Adachi,J., Bono,H., Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,!., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa, Y. , Nogami,A., 
Schonbach, C. , Gojobori,T., Baldarelli, R. , Hill, D. P., Bult,C, 
Hume, D. A., Quackenbush, J . , Schriml , L . M. , Kanapin,A. , Matsuda,H., 
Batalov,S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., 
Chothia,C, Corbani , L , E . , Cousins, S., Dalla,E., Dragani , T . A. , 
Fletcher, C . F. , For rest, A. , Frazer,K.S. , Gaasterland, T . , 
Gariboldi,M. , Gissi,C, Godzik,A. , Gough,J., Grimmond,S., 



Gustincich, S . , Hirokawa, N . , Jackson, I . J . , Jarvis,E.D., Kanai,A., 
Kawaji,H., Kawasawa,Y., Kedzierski, R. M. , King,B.L., Konagaya,A., 
Kurochkin, I . V. , Lee,Y., Lenhard,B., Lyons, P. A., Maglott, D. R. , 
Maltais,L., Marchionni, L. , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky,N. , Pillai,R., Pontius , J. U . , Qi,D., Ramachandran, S . , 
Ravasi,T., Reed, J. C, Reed, D. J., Reid,J., Ring,B.Z., Ringwald,M., 
Sandelin,A., Schneider, C . , Semple,C.A. , Setou,M., Shimada, K. , 
Sultana, R. , Takenaka,Y., Taylor, M.S., Teasdale, R. D . , Tomita,M., 
Verardo,R., Wagner, L., Wahlestedt , C . , Wang,Y., Watanabe,Y., 
Wells, C, Wilming, L . G . , Wynshaw-Boris , A. , Yanagisawa,M. , Yang, I., 
Yang,L., Yuan, Z . , Zavolan,M., Zhu,Y., Zimmer,A. , Carninci,P., 
Hayatsu,N., Hirozane-Kishikawa, T . , Konno,H., Nakamura,M., 
Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K., 
Arakawa,T., Fukuda,S., Hara,A., Hashizume, W. , Imotani, K. , Ishii,Y., 
Itoh,M,, Kagawa,I., Miyazaki,A., Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa,A. , Yasunishi, A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J., Birney,E. and Hayashizaki , Y . 

TITLE Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

JOURNAL Nature 420, 563-573 (2002) 

MEDLINE 22354683 
PUBMED 12466851 
COMMENT Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome- res@gsc . riken . go . jp, 

URL : http : / / genome . gsc . riken . go . jp/ 

Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., 
Fukuda,S., Hashizume, W. , Hayashida, K . , Hirozane,T., Hori,F., 
Imotani, K., Ishii,Y., Itoh,M., Kagawa,I., Kawai,J., Kojima,Y., 
Kondo,S., Konno,H., Koya,S., Miyazaki,A., Murata,M., Nakamura,M., 
Nomura, K. , Numazaki,R., Ohno,M., Ohsato,N., Saito,R., Sakazume,N., 
Sano,H., Sasaki, D., Sato,K., Shibata,K., Shiraki,T., Tagami,M., 
Takeda,Y., Waki,K., Watahiki,A., Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 



further details- 



FEATURES 

source 



Location/ Qualifiers 
1. .583 

/organism="Mus mus cuius" 

/mol_type= n mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="1300003C16" 

/sex="male" 

/ tissue_type="liver" 

/dev_stage="adult" 

/clone_lib="RIKEN full-length enriched, adult male liver" 



ORIGIN 



Query Match 8.3%; Score 130; DB 13; Length 583; 

Best Local Similarity 100.0%; Pred. No. 4.3e-27; 

Matches 130; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



Qy 

Db 



1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 
I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
131 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 72 



Qy 

Db 

Qy 

Db 



61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 12 0 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I! I I I I I I I I I I 
7 1 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 12 

121 TTTGCTCCTT 130 
I I I I I I I I I I 
11 TTTGCTCCTT 2 



RESULT 10 

AK004871/C 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 

TITLE 

JOURNAL 



AK004871 3623 bp mRNA linear HTC 20-SEP-2003 

Mus musculus adult male liver cDNA, RIKEN full-length enriched 
library, clone : 1300003C16 product :ATP-BINDING CASSETTE, SUB-FAMILY 
G, MEMBER 8 (STEROLIN-2) homolog [Mus musculus], full insert 
sequence . 
AK004871 

AK004 871.1 GI: 12 836380 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Carninci,P. and Hayashizaki, Y. 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y, 
Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new genes 
Genome Res. 10 (10), 1617-1630 (2000) 



MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 



COMMENT 



20499374 
11042159 
3 

Shibata,K., Itoh,M., Aizawa,K., Nagaoka,S., Sasaki, N., Carninci,P. 
Konno,H., Akiyama,J., Nishi,K., Kitsunai,T., Tashiro,H., Itoh,M. , 
Sumi,N., Ishii,Y., Nakamura,S., Hazama,M., Nishine,T., Harada,A., 
Yamamoto, R. , Matsumoto, H . , Sakaguchi , S . , Ikegami, T . , Kashiwagi, K. , 
Fujiwake, S . , Inoue,K., Togawa,Y., Izawa,M. , Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J. 
Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

The RIKEN Genome Exploration Research Group Phase II Team and the 
FAN TOM Consortium. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409, 685-690 (2001) 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 3623) 

Adachi,J., Aizawa,K., Akahira,S., Akimura,T., Arai,A., Aono,H., 
Arakawa,T., Bono,H., Carninci,P., Fukuda,S., Fukunishi, Y. , 
Furuno,M., Hanagaki,T., Hara,A. , Hayatsu,N., Hiramoto,K., 
Hiraoka,T., Hori,F., Imotani, K. , Ishii,Y., Itoh,M. , Izawa,M., 
Kasukawa,T., Kato,H., Kawai,J., Kojima,Y., Konno,H., Kouda,M. , 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A. , Nishi,K., 
Nomura, K., Numazaki,R., Ohno,M., Okazaki,Y., Okido,T., Owa,C, 
Saito,H., Saito,R., Sakai,C, Sakai,K., Sano,H., Sasaki, D., 
Shibata,K., Shibata,Y., Shinagawa, A. , Shiraki,T., Sogabe,Y., 
Suzuki, H., Tagami,M., Tagawa,A., Takahashi, F. , Tanaka,T., 
Tejima,Y., Toya,T., Yamamura,T., Yasunishi, A. , Yoshida,K., 
Yoshino,M., Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Submitted ( 10- JUL-2000 ) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN), Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 
Kanagawa 230-0045, Japan (E-mail : genome-res@gsc . riken . go . jp, 
URL: http://genome.gsc.riken.go. jp/, Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details . 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. First strand cDNA was primed with a primer 
[5 1 GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 1 ], cDNA was 
prepared by using trehalose thermo-activated reverse transcriptase 
and subsequently enriched for full-length by cap-trapper. Second 



FEATURES 

source 



CDS 



polyA_signal 
polyA_site 



strand cDNA was prepared with the primer adapter of sequence [5 1 
GAGAGAGAGAAGGAT C CAAGAGC T CAAT T AAT T TAAT TAAAC CCCCCCCCCC 3']- cDNA was 
cleaved with Xhol and Sstl. Cloning sites, 5 ! end: SstI; 3' end: 
Xhol. Host: SOLR. 

Location/Qualifiers 

1. .3623 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/ db__xref ="FANTOM_DB : 1300003C16" 

/db__xref="MGI : 1896857" 

/db_xref="taxon: 10090" 

/clone="1300003Cl6" 

/sex-"male" 

/tissue_type= "liver" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 
/ dev_stage= "adult" 
69. .2090 

/note="unnamed protein product; ATP-BINDING CASSETTE, 
SUB-FAMILY G, MEMBER 8 (STEROLIN-2) homolog [Mus musculus] 
(SWISSPROTI Q9DBM0, evidence: FASTY, 92%ID, 96.7%length, 
match=1796) 
putative" 
/codon_start=l 
/protein_id="BAB23630. 1" 
/db_xref="GI: 12836381" 

/trans lation="MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQ 
SNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQML 
AIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLP 
NLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGE 
RRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSD 
IFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKE 
REVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVEL 
PGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAA 
LLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYV 
1 1 YAMPI YWLTNLRPVPELFLLHFLLWLVVFCCRTMALAASAMLPTFHMS S FFCNAL 
YNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSI 
LGDTMI SAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 
3605. .3610 
/ note="putative" 
3623 

/ no te= "putative" 



ORIGIN 



Query Match 8.3%; Score 130; DB 11; Length 3623; 

Best Local Similarity 100.0%; Pred. No. 1.3e-26; 

Matches 130; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I M I I I I I M I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 131 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 72 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 

I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 71 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 12 



Qy 



121 TTTGCTCCTT 130 



Db 11 TTTGCTCCTT 2 



RESULT 11 
AI151811/C 

LOCUS AI151811 500 bp mRNA linear EST 30-SEP-1998 

DEFINITION ui46cl0.yl Sugano mouse embryo mewa Mus musculus cDNA clone 

IMAGE: 1885458 5 1 , mRNA sequence. 
ACCESSION AI151811 
VERSION AI151811.1 GI:3680280 

KEYWORDS EST. 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 500) 

AUTHORS Marra,M., Hillier,L., Allen, M., Bowles, M. , Dietrich, N., Dubuque, T., 

Geisel,S., Kucaba,T., Lacy,M., Le, M. , Martin, J., Morris, M. , 

Schellenberg,K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 

Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 

Waterston, R. 
TITLE The WashU-HHMI Mouse EST Project 

JOURNAL Unpublished (1996) 
COMMENT Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 969782 

Seq primer: custom primer used 
High quality sequence stop: 499. 
FEATURES Location/Qualifiers 
source 1. .500 

/organism= M Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL" 

/db_xref="taxon: 10090" 

/clone="IMAGE: 1885458" 

/dev_stage=" embryo, 14 dpc" 

/lab_host="DH10B" 

/clone_lib=" Sugano mouse embryo mewa" 

/note="Vector: pME18S-FL3; Site_l: Dralll (CACTGTGTG); 
Site_2: Dralll (CACCATGTG) ; 1st strand cDNA was primed 
with an oligo(dT) primer [ATGTGGCCTTTTTTTTTTTTTTTTT] ; 
double-stranded cDNA was ligated to a Dralll adaptor 
[TGTTGGCCTACTGG] , digested and cloned into distinct Dralll 
sites of the pME18S~FL3 vector (5' site CACTGTGTG, 3 f site 
CACCATGTG) . Xhol should be used to isolate the cDNA 
insert. Size selection was performed to exclude fragments 
<1.5kb. Library constructed by Dr. Sumio Sugano 
(University of Tokyo Institute of Medical Science) . 
Custom primers for sequencing: 5' end primer 



ORIGIN 



CTTCTGCTCTAAAAGCTGCG and 3 f end primer 
CGACCTGCAGCTCGAGCACA. " 



Query Match 7.8%; Score 123; DB 9; Length 500; 

Best Local Similarity 100.0%; Pred. No. 4.9e-25; 

Matches 123; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 123 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 64 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 63 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 4 

Qy 121 TTT 123 

I I I 

Db 3 TTT 1 



RESULT 12 

BB610072/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BB610072 510 bp mRNA linear EST 26-OCT-2001 

BB610072 RIKEN full-length enriched, adult male liver Mus musculus 
cDNA clone 1300007N20 5', mRNA sequence. 
BB610072 

BB610072. 1 GI: 16451685 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 510) 

Arakawa,T., Carninci,P., Fukuda,S., Furuno,M. , Hanagaki,T., 
Hara,A., Hiramoto,K., Hori,F., Ishii,Y., Ito,M., Kawai,J., 
Konno,H., Kouda,M., Koya,S., Matsuyama, T . , Miyazaki,A., Nomura, K., 
Ohno,M., Okazaki,Y., Okido,T., Saito,R., Sakai,C, Sakai,K., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Suzuki, H., Tagami,M., Tagawa,A., Takahashi, F . , 
Takeda,Y., Tanaka,T., Toya,T., Muramatsu,M. and Hayashizaki, Y. 
RIKEN Mouse ESTs (Arakawa,T., et al . 2001) 
Unpublished (2001) 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc . riken . go . jp, 

URL: http : // genome . gsc . riken. go . jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 



FEATURES 

source 



ORIGIN 



wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M. , Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura,S., Kawai,J., Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi, Y. , Shibata,K., Itoh,M., Carninci,P., 
Sugahara,Y. and Hayashizaki, Y. 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Kondo,S., Shinagawa, A. , Saito,T., Kiyosawa,H., Yamanaka,I., 
Aizawa,K., Fukuda,S., Hara,A. , Itoh,M., Kawai,J., Shibata,K. and 
Hayashizaki, Y. 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences. Mamm. Genome. 12, 673-677 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details, 
e mouse tissues. 

Location/Qualif iers 
1. .510 

/organism="Mus mus cuius" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="1300007N2 0" 

/ sex="male" 

/ tissue_type="liver" 

/dev_stage="adult" 

/clone_lib="RIKEN full-length enriched, adult male liver" 



Query Match 7.8%; Score 123; DB 10; Length 510; 

Best Local Similarity 100.0%; Pred. No. 5e-25; 

Matches 123; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



Qy 

Db 



1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 
I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
12 6 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 67 



Qy 

Db 



61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I II I I I I I I I I I I 
66 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 7 



Qy 

Db 



121 TTT 123 

I I I 
6 TTT 4 



RESULT 13 

AI157365/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 



AI157365 511 bp mRNA linear EST 30-SEP-1998 

ui45h01.yl Sugano mouse embryo mewa Mus musculus cDNA clone 
IMAGE: 1885393 5', mRNA sequence. 
AI157365 

AI157365.1 GI:3685834 



KEYWORDS EST. 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 511) 

AUTHORS Marra,M., Hillier,L., Allen, M., Bowles, M. , Dietrich, N. , Dubuque, T., 

Geisel,S., Kucaba,T., Lacy,M. , Le, M. , Martin, J., Morris, M. , 

Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 

Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 

Waterston, R. 
TITLE The WashU-HHMI Mouse EST Project 

JOURNAL Unpublished (1996) 
COMMENT Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 969717 

Seq primer: custom primer used 
High quality sequence stop: 480. 
FEATURES Location/Qualifiers 
source 1 . . 511 

/organism= M Mus musculus" 

/mol_t ype= "mRNA" 

/strain= M C57BL" 

/db_xref="taxon: 10090" 

/clone="IMAGE: 1885393" 

/ de v_s t a ge=" embryo, 14 dpc" 

/lab_host="DH10B" 

/clone_lib="Sugano mouse embryo mewa" 

/note="Vector : pME18S-FL3; Site_l: Drain (CACTGTGTG); 
Site_2: Drain (CACCATGTG) ; 1st strand cDNA was primed 
with an oligo(dT) primer [ATGTGGCCTTTTTTTTTTTTTTTTT] ; 
double-stranded cDNA was ligated to a Drain adaptor 
[TGTTGGCCTACTGG] , digested and cloned into distinct Drain 
sites of the pME18S-FL3 vector (5 1 site CACTGTGTG, 3' site 
CACCATGTG) . Xhol should be used to isolate the cDNA 
insert. Size selection was performed to exclude fragments 
<1.5kb. Library constructed by Dr. Sumio Sugano 
(University of Tokyo Institute of Medical Science) , 
Custom primers for sequencing: 5 f end primer 
CTTCTGCTCTAAAAGCTGCG and 3' end primer 
CGAC CT GCAGCT C GAGCACA . " 

ORIGIN 

Query Match 7.3%; Score 115; DB 9; Length 511; 

Best Local Similarity 100.0%; Pred. No. 1.3e-22; 

Matches 115; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I II I I II I i I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 116 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 57 



Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCT 115 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I 
Db 56 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCT 2 



RESULT 14 
BH321870/C 

LOCUS BH321870 599 bp DNA linear GSS 03-DEC-2001 

DEFINITION CH230-7C13.TVB CHORI-230 Segment 1 Rattus norvegicus genomic clone 

CH230-7C13, genomic survey sequence. 
ACCESSION BH321870 

VERSION BH321870.1 GI: 17252584 

KEYWORDS GSS. 

SOURCE Rattus norvegicus (Norway rat) 

ORGANISM Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

REFERENCE 1 (bases 1 to 599) 

AUTHORS Zhao,S., Shetty, J., Shatsman,S., Tsegaye,G., Geer,K., 

Shvartsbeyn, A. , Gebregeorgis , E . , Overton, L., Russell, D., Chen,D., 

Riggs,F., de Jong, P. and Fraser,C.M. 
TITLE Rat BAC End Sequences from Library CHORI-230 EcoRI segment 

JOURNAL Unpublished (1999) 
COMMENT Other_GSSs: CH230-7C13 . T JB 

Contact: Shaying Zhao 

Department of Eukaryotic Genomics 

The Institute for Genomic Research 

9712 Medical Center Dr., Rockville, MD 20850, USA 
Tel: 301 838 0200 
Fax: 301 838 0208 
Email: szhao@tigr.org 

Clones are derived from the rat BAC library CHORI-230 
(http://www.chori.org/bacpac/rat230.htm). For BAC library 
availability, please contact Pieter de Jong (pdejong@mail.cho.org). 
Clones may be purchased from BACPAC Resources 

(http://www.chori.org/bacpac/or ering_information.htm). BAC end 
page : http : / /www. tigr . org/tdb/bac_ends/rat/bac_end_intro . html 
Plate: 7 row: C column: 13 
Seq primer: T7 
Class: BAC ends. 
FEATURES Location/Qualifiers 
source 1. .599 

/organism="Rattus norvegicus" 

/mol_type=" genomic DNA" 

/strain="BN/SsNHsd/MCW" 

/db_xref="taxon: 10116" 

/clone="CH230-7C13" 

/sex="Female M 

/cell_type="Brain" 

/clone_lib="CHORI-230 Segment 1" 

/note="Vector: pTARBAC2.1; Site_l : EcoRI; Site_2 : EcoRI; 
CHORI-230 Rat ( BN/SsNHsd/MCW) BAC library produced by 
Pieter de Jong" 

ORIGIN 



Query Match 7.2%; Score 112.8; DB 28; Length 599; 

Best Local Similarity 73.4%; Pred. No. 6.3e-22; 

Matches 160; Conservative 1; Mismatches 48; Indels 9; Gaps 



1; 



Qy 1352 CT AAGCACAAT GT T T AAGAAGT RAGT TTAAGT T GT AG AGAGG CAG C CAT G CAT T T G G CAT 1411 

I I I I I I I I I I I I : I II I I I I i I I I I I I I I I I I I I I I I I I I I I I 

Db 599 C CAAAAAAAAAAAAAAAGAAAT GAGT T T AAGT T G GAGAGAAAAGG CT AT G CAT T TAG CAT 540 

Qy 1412 T T GAAT ACAAT CT G GT GACT TGTCTGGCTGC CAATAGAAC CT AGT AC CAAAGT GAAAT C T 1471 

I I I I I I I II I I I I I I III I I I I I II I I I I I I II I I I I II I I I I 

Db 539 T T GAACAAAAT CT AGT GA C T GT GAATAGAAC CT GGT AT CAAAGT GAAACCT 489 

Qy 14 72 TGAGGAAAATCCCTGGAAAGAGTGGAAAGTCCTGCCTAACACGTAAGTGCCTTCTTTGCT 1531 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 488 TAG GAAAAGTATCTGGAGAAAGTGGGAAGTCCTGCCTGACGTGTAAG GACT TTCTGTGCT 429 

Qy 1532 T GT T T GAT T GACT GT GAT G CT AGAGAGCAAAC C C AGAG 1569 

I I I I I I I I I I I I I I II I I I I I I I I II I II II 
Db 42 8 T GT T T GAT T GACT GTGGTGCTG GAGAT C AGAG C CT CAG 391 



RESULT 15 

BI246567 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



BI246567 764 bp mRNA linear EST 17-JUL-2001 

602958477F1 NCI_CGAP_Li9 Mus musculus cDNA clone IMAGE : 5124187 5', 
mRNA sequence. 
BI246567 

BI24 6567 .1 GI : 147 90652 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 764) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 

Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 

Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: Jeffrey E. Green, M.D. 

cDNA Library Preparation: Life Technologies, Inc. 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http : / / image . llnl . gov 
Plate: LLAM11303 row: i column: 20 
High quality sequence start: 2 
High quality sequence stop: 666. 

Location/Qualifiers 

1. .764 

/organism-"Mus musculus" 
/mol_type="mRNA" 
/strain="FVB/N" 
/db_xref="taxon: 10090" 
/clone=" IMAGE: 5124187" 

/lab_host="DH10B (Tl phage-resistant) " 



/clone_lib="NCI_CGAP_Li9" 

/note="Organ: liver; Vector: pCMV-SP0RT6; Site_l: NotI; 
Site_2: Sail; Cloned unidirectionally . Primer: Oligo dT . 
Average insert size 1.9 kb. Constructed by Life 
Technologies. Note: this is a NCI_CGAP Library." 

ORIGIN 



Query Match 6.6%; Score 103.4; DB 12; Length 764; 

Best Local Similarity 98.3%; Pred. No. 4.9e-19; 

Matches 115; Conservative 0; Mismatches 1; Indels 1; Gaps 1; 

Qy 1162 AGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAA 1221 

I I I I I I I I I I I I I I I i I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 141 AGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAA 2 00 



Qy 1222 AT C CT CAAAGAT GTCTCCTTG - TAC AT C GAGAGT G GC C AGAT TAT GT GC AT CT T AG G 12 7 7 

I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 201 AT C C T CAAAGAT GTCTCCTT GAT AC AT C GAGAGT GGC C AGAT TAT GT G CAT CT T AC G 257 



RESULT 16 

AI574075/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



AI574075 435 bp mRNA linear EST 29-MAR-1999 

uj67hll.yl Sugano mouse liver mlia Mus musculus cDNA clone 
IMAGE: 1925061 5', mRNA sequence. 
AI574075 

AI574075. 1 GI:4537449 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 435) 

Marra,M., Hillier,L., Kucaba,T., Martin, J., Beck,C, Wylie,T., 

Underwood, K. , Steptoe,M., Theising,B., Allen, M., Bowers, Y. , 

Person, B., Swaller,T., Gibbons, M. , Pape,D., Harvey, N., Schurk,R., 

Ritter,E., Kohn,S., Shin,T., Jackson, Y., Cardenas, M. , McCann,R., 

Waterston,R. and Wilson, R. 

The WashU-NCI Mouse EST Project 1999 

Unpublished (1999) 

Contact: Marra M/WashU-NCI Mouse EST Project 1999 
Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI : 98 1353 

Seq primer: custom primer used 
High quality sequence stop: 432. 

Location/Qualifiers 

1. .435 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL" 

/db xref="taxon: 10090" 



ORIGIN 



/ clone=" IMAGE : 1925061" 
/sex="female" 
/dev_stage="adult" 
/lab_host="DHlOB" 

/clone__lib="Sugano mouse liver mlia" 

/note= n Organ: liver; Vector: pME18S-FL3; Site_l: Dralll 
(CACTGTGTG) ; Site_2: Drain (CACCATGTG) ; 1st strand cDNA 
was primed with an oligo(dT) primer 

[ATGTGGCCTTTTTTTTTTTTTTTTT] ; double-stranded cDNA was 
ligated to a Dralll adaptor [TGTTGGCCTACTGG] , digested 
and cloned into distinct Dralll sites of the pME18S-FL3 
vector (5 1 site CACTGTGTG, 3 1 site CACCATGTG). Xhol should 
be used to isolate the cDNA insert. Size selection was 
performed to exclude fragments <1.5kb. Library 
constructed by Dr. Sumio Sugano (University of Tokyo 
Institute of Medical Science) . Custom primers for 
sequencing: 5' end primer CTTCTGCTCTAAAAGCTGCG and 3' end 
primer CGAC CT GC AGCT C GAGCACA . " 



Query Match 4.1%; Score 64; DB 9; Length 435; 

Best Local Similarity 100.0%; Pred. No. 2.2e-07; 

Matches 64; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I i I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 64 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 5 

Qy 61 CATG 64 

I I I I 

Db 4 CATG 1 



RESULT 17 

CD502116 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



CD502116 606 bp mRNA linear EST 12-JUN-2003 

CDA54-H04 .xld-t SHGC-CDA Gasterosteus aculeatus cDNA clone 
CDA54-H04 5', mRNA sequence. 
CD502116 

CD502116. 1 GI: 31429142 
EST. 

Gasterosteus aculeatus (three spined stickleback) 
Gasterosteus aculeatus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Actinopterygii ; Neopterygii; Teleostei; Euteleostei; Neoteleostei ; 
Acanthomorpha ; Acanthopterygii ; Percomorpha; Gasterosteif ormes ; 
Gasterosteidae ; Gasterosteus. 
1 (bases 1 to 606) 

Kingsley, D.M. , Peichel,C, Balabahdra, S . , Grimwood,J., Dickson, M., 
Schmutz, J. and Myers, R.M. 

Expressed sequence tags from Gasterosteus aculeatus 
Unpublished (2 003) 
Contact: Kingsley, DM 

HHMI and Department of Developmental Biology 
Stanford University School of Medicine 

Beckman Center B300, 279 Campus Drive, Stanford, CA 94305-5329, USA 
Tel: 650 725 5954 



Fax: 650 725 7739 

Email : kingsley@cmgm. Stanford . edu 
Plate: 54 

High quality sequence stop: 606. 
FEATURES Location/Qualifiers 
source 1. .606 

/ organism="Gasterosteus aculeatus " 
/mol_type="mRNA" 
/strain= n Salinas river, CA" 
/db_xref="taxon: 69293" 
/clone="CDA54-H04" 
/sex="mixed male and female" 

/tissue_type="heads and internal organs combined" 

/dev_stage="adult" 

/clone_lib="SHGC-CDA" 

/note="Vector : lambda ZAP Express /pBK-CMV; Site_l: EcoRl 
{5 1 adaptor); Site_2: Xhol (3 1 linker primer); The mixed 
organ cDNA library was generated using the ZAP-cDNA method 
by Stratagene. First strand cDNA synthesis was primed with 
a a 50 bp linker primer containing an oligo dT sequence 
preceeded by a synthetic Xhol site. 5 prime adaptors were 
used containing an EcoRl cohesive end. The finished cDNAs 
were inserted in to the ZAP express vector 
unidirectionally in the sense orientation with respect to 
the lacZ promoter of pBK-CMV. An amplified library was 
prepared from approximately 3 million primary clones in 
the lambda ZAP Express vector. In vivo excision was then 
used to generate individual pBK-CMV phagemid clones for 
EST sequencing. " 

ORIGIN 

Query Match 3.8%; Score 59.4; DB 14; Length 606; 

Best Local Similarity 67.2%; Pred. No. 6.6e-06; 

Matches 84; Conservative 0; Mismatches 41; Indels 0; Gaps 0; 

Qy 1162 AGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAA 1221 

II I I I I I I I I I I I I I I I I I I I I I I I III 111 III 

Db 277 AGTGAGCGTGTGGGTCCGTGGTGGGACTTACCCTCCTTCAGGAAGCGATGGACTCGTCAG 336 

Qy 1222 AT C CT C AAAGAT GTCTCCTT GT AC AT C GAGAGT GG C C AGAT TAT GT G CAT CT T AGGC AG C 12 81 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 337 AT C CT C AAT GAT GTCTCCTTC CAC GT GGAC AG C GG GCAGAT CAT G G G CAT ACT GGGCAAT 396 

Qy 12 82 TCAGG 12 8 6 

I I I I I 

Db 397 TCAGG 401 



RESULT 18 

BX381961/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 



BX381961 1201 bp mRNA linear EST 08-MAY-2003 

BX381961 Homo sapiens PLACENTA COT 25-NORMALIZED Homo sapiens cDNA 
clone CS0DI072YF05 3-PRIME, mRNA sequence. 
BX381961 

BX381961.1 GI:30453007 
EST. 

Homo sapiens (human) 



ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini ; Hominidae; Homo . 
1 (bases 1 to 1201) 

Li,W.B., Gruber,C, Jessee, J. and Polayes,D. 
Full-length cDNA libraries and normalization 
Unpublished (2001) 
Contact: Genoscope 

Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - France 

Email: seqref@genoscope.cns.fr, Web : www.genoscope.cns.fr 
Library was constructed by Life Technologies, a division of 
Invitrogen. Contact : Feng Liang Email : fliang@lifetech.com URL : 
http://fulllength.invitrogen.com/ InVitroGen Corporation 1600 
Faraday Avenue Genoscope sequence ID : CS0DI072CC03NP1 . 

Location/ Qualifiers 

1. .1201 

/organism="Homo sapiens" 
/mol_t ype= "mRNA" 
/db_xref="taxon: 9606" 
/clone="CS0DI072YF05" 

/tissue_type=" PLACENTA COT 2 5 -NORMALIZED" 
/clone_lib="Homo sapiens PLACENTA COT 25-NORMALIZED" 
/note="lst strand cDNA was primed with a Notl-oligo (dT) 
primer. Five prime end enriched, double-strand cDNA was 
digested with Not I and cloned into the Not I and EcoR V 
sites of the pCMVSPORT 6 vector. Library was normalized. " 



ORIGIN 



Query Match 3.4%; Score 54; DB 13; Length 1201; 

Best Local Similarity 6.8%; Pred. No. 0.00042; 

Matches 45; Conservative 242; Mismatches 375; Indels 0; Gaps 0; 



Qy 450 GAG GGAGC C AGAG G GC C T CAC AT C AAC AGAGGGT CT C T GAGCT C CC T GGAGCAAGGT T C G 509 

I I : I I I : I | | | :::::::: | : | :: : | : : : : : | : : : : : 
Db 1192 GSGKGGGGCCSCGCCCCCCMMMMVMMMMGMMGVMMKNGGMMGMMMMMMMMGMMMMMBGB 1133 

Qy 510 GTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGG 569 

: I : : | : : : | : : | : | | : | : | | : : | : : | : : : : 

Db 1132 MVGKGVGKGKGGKKGMVKMMMGGCMSCVKKGGSNKGCBGCMGKGCGKKGCCMSKGKWKM 1073 

Qy 570 TAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGGGT 62 9 

: :: :: ::::| :::: : :|: : : : :::: | :::::::: |:: 
Db 1072 GBMMNKKKMMSKVMMMGKMMMMKBKMMGKKKKKNNKKKKBBKTVKKKKTVKKKBMKTGKK 1013 

Qy 630 GGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGTAA 689 

: : : : I I : : : : I : : : : : : : ::::::: | :::: : 

Db 1012 KKKKGKMMGGBMMKVMMMGKGKMVHGKGKBMMBGKGKMM 953 

Qy 690 CAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACAT 749 

Db 952 BGMVNKKTMMMMBKKMNKKMNMMGKKVMMGK 8 93 

Qy 750 GCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCG 809 

: : : : I : :: : : | ::::::::::::: : : : 
Db 8 92 KKNMMKKKGKKTGKKKKKKNHMMKKTMNMNKKKKMKMKKKKKKNMMVKGKMKGKMGKKKV 833 



Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



810 CCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCACAA 8 69 

: :::::::: | : : : : : : : I : : I I : I : : : : : 

832 MKGKGMMNMKKKMGKGBKMCMVK^ 773 

870 AAT GGAAT GAACAC T G CT GAAGGAAT G CAGGGT T C ACT T CAAGAAGAAAG C AGT GT G C AG 92 9 

772 NNNNNMMNNNNNKKNNKNAMN^ 713 

930 GT GTACCAT CT C C CAGTCAGAGACC CAGTAAT CAGAGCAGCTAAT GGGAGGCAT GCT C CT 989 

712 AMMNMMNNNNKMK^^ 653 

990 T GGGT GGT GGC CAACTT GT CAT TAT AC CT CCAAGGACAACAGAGT GGT ACATAAGGCT AA 104 9 

652 NKNNNNMNNNMNMMNNNN^ 593 

1050 AACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTC 1109 

5 92 NMNNNNNMMMMMMMNN 533 

1110 TG 1111 

I 

532 NG 531 



RESULT 19 

BX381961 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



BX381961 1201 bp mRNA linear EST 08-MAY-2003 

BX381961 Homo sapiens PLACENTA COT 25-NORMALIZED Homo sapiens cDNA 
clone CS0DI072YF05 3-PRIME, mRNA sequence. 
BX381961 

BX381961.1 GI:30453007 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 1201) 

Li,W.B., Gruber,C, Jessee, J. and Polayes,D. 
Full-length cDNA libraries and normalization 
Unpublished (2001) 
Contact: Genoscope 

Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - France 

Email: seqref @genoscope . ens . f r , Web : www.genoscope.cns.fr 
Library was constructed by Life Technologies , a division of 
Invitrogen. Contact : Feng Liang Email : fliang@lifetech.com URL : 
http://fulllength.invitrogen.com/ InVitroGen Corporation 1600 
Faraday Avenue Genoscope sequence ID : CS0DI072CC03NP1 . 

Location/Qualifiers 

1. .1201 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone="CS0DI072YF05" 

/tissue_type="PLACENTA COT 25-NORMALIZED" 
/cloneJLib="Homo sapiens PLACENTA COT 25-NORMALIZED" 



/note="lst strand cDNA was primed with a Notl-oligo (dT) 
primer. Five prime end enriched, double-strand cDNA was 
digested with Not I and cloned into the Not I and EcoR V 
sites of the pCMVSPORT 6 vector. Library was normalized." 

ORIGIN 

Query Match 3.1%; Score 49.2; DB 13; Length 1201; 

Best Local Similarity 4.4%; Pred. No. 0.012; 

Matches 32; Conservative 265; Mismatches 436; Indels 2; Gaps 1; 
Qy 57 8 CCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGT 637 



Db 371 CNCNCMAAKCCNNACKANNNKKKMKACNANNNNNKCI<MMMNNNKKKKKCMNKNKKKMMNK 430 

Qy 638 GGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTG 697 

Db 431 KNMNKKKKKKKKNNNNNCNANNMMNNKNKKKNNKNNNNTKNNMNNNNCNKMNNKNKKNNN 4 90 

Qy 698 AGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCC 757 



Db 491 NN KMN KNMMN KNNN CNNN KMC KMMNMMN KMMMMN KMMNNN CN KMMN AMNN KKMN KMMMN K 550 

Qy 758 TGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCAC 817 

Db 551 NNMMNCKTNNNNNNNNKNNNNNKNNNNNKKKKKKKNNNNNKNNNMNNNNANKNNNNNNNN 610 

Qy 818 CT GT CCT GT GTAGAT GGAGAAGGCT CGGAGAGT GGGGGT GCT GGGGGCACAAAAT GGAAT 877 

Db 611 NKNNCMNKKKKKKMKMMMMNKMNNNNNNKKNKNNNKNNNNMNKMNNNN 670 

Qy 878 GAACACTGCTGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTGCAGGTGTACCA 937 

Db 671 NNNNNNNNKKKKKKKKNNNNNNNNNNNNNMKMNNNNKKNKKTNKKKKNNAKKNNTNNKMM 7 30 

Qy 938 TCTCCCAGTCAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGT 997 

Db 731 NNKNNNNNNMMNNCNCNKKKNNKTNMNNMMNNNNNKKNNNNNKNNNNNNNNNNKKMKMSK 790 

Qy 998 G GC CAAC T T GT CAT TAT AC C T C CAAGGACAAC AGAGT G GT AC AT AAGGCT AAAAC AGAGT 1057 



Db 791 CKKKKMCCKMCCMCKKK — KKMBKGKMVCMCKMMMKNKKCMCMKBMMMCKMCMKMCMBKK 848 

Qy 1058 TGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCT 1117 

Db 84 9 NMMMMMMPCMKMMMMNKNKAMMKKDNMMMMMMCAM 908 

Qy 1118 TCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGC 1177 

: :::::::: | :::: :: : : :::::::::::::: ::: : 

Db 909 KKKHKKNMMKKMMMCKKBMMCKKNKMMNKMMVKK 968 

Qy 1178 CTT GGT GGAAC AT CAAAT CAT G C C AGCAGAAGT G G GACAGGCAAAT CCT C AAAGAT GT CT 1237 

Db 969 MCAKKKKMCMCVKKVTVICMCD 102 8 

Qy 1238 CCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGG 1297 

Db 1029 BAMWMMMMNNMMMMMC 108 8 



Qy 1298 GGGGSCSGGGGCTCC 1312 

:: i : I : I : 
Db 1089 CMCKGCVGCMNSCCM 1103 



RESULT 2 0 

CNS03RDA 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



CNS03RDA 925 bp DNA linear GSS 01-SEP-2000 

Tetraodon nigroviridis genome survey sequence PUC-Ori end of clone 
049112 of library G from Tetraodon nigroviridis , genomic survey 
sequence . 
AL257095 

AL2 57 095. 1 GI : 7 97 8107 
GSS; genome survey sequence. 
Tetraodon nigroviridis 
Tetraodon nigroviridis 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Actinopterygii ; Neopterygii; Teleostei; Euteleostei; Neoteleostei ; 
Acanthomorpha; Acanthopterygii ; Percomorpha; Tetraodontif ormes ; 
Tetradontoidea; Tetraodontidae; Tetraodon. 
1 

Roest Crollius,H., Jaillon,0., Dasilva,C, Bouneau,L., Fisher, C, 
Bernot, A. , Fizames, C . , Wincker,P., Brottier,P., Quetier,F., 
Saurin,W. and Weissenbach, J . 

Estimate of human gene number provided by genome-wide analysis 

using Tetraodon nigroviridis DNA sequence 

Nat. Genet. 25 (2), 235-238 (2000) 

20296633 

10835645 

2 

Roest Crollius,H., Jaillon,0., Dasilva,C, Ozouf-Costaz, C . , 
Fizames, C, Fischer, C, Bouneau,L., Billault,A. , Quetier,F., 
Saurin,W., Bernot, A. and Weissenbach, J. 

Characterization and repeat analysis of the compact genome of the 

freshwater pufferfish Tetraodon nigroviridis 

Genome Res. 10 (7), 939-949 (2000) 

20359837 

10899143 

3 (bases 1 to 925) 
Genoscope . 
Direct Submission 

Submitted ( 12-APR-2000 ) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

This sequence is a single read and was generated as part of a large 
scale clone-end sequencing project of the Tetraodon nigroviridis 
genome. For more information, please take a look at 
http : / /www. genoscope . ens . fr /Tetraodon . 

Location/Qualifiers 

1. .925 

/ organism="Tetraodon nigroviridis " 
/mol_type= "genomic DNA" 
/db_xref="taxon: 99883" 
/clone="049I12" 
/clone_lib="G" 

/note="Genoscope sequence ID : C0BG049BE06SPl~end : 



ORIGIN 



PUC-Ori" 



Query Match 3.1%; Score 48.2; DB 29; Length 925; 

Best Local Similarity 72.9%; Pred. No. 0.02; 

Matches 62; Conservative 0; Mismatches 23; Indels 0; Gaps 0; 

Qy 1210 T GG GACAG GCAAAT C CT CAAAGAT GT CT C CT T GT AC AT C GAGAGT GGC C AGAT TAT GT GC 1269 

III I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 7 T GGAC T C GT C AGAT T C T C AAC GAC GTCTCCTTC C AC GT GGAGAGC GGC C AGAT CAT GGGC 66 

Qy 1270 ATCTTAGGCAGCTCAGGTAAGTGCC 1294 

I I I I I I I I I I I I I I I I II 
Db 67 ATCCTGGGCAACTCAGGTCTGCCCC 91 



RESULT 21 

BX376097/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



BX376097 1201 bp mRNA linear EST 08-MAY-2003 

BX37 6097 Homo sapiens NEUROBLASTOMA COT 25-NORMALIZED Homo sapiens 
cDNA clone CS0DC022YM12 5-PRIME, mRNA sequence. 
BX376097 

BX376097.1 GI:30434756 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 1201) 

Li,W.B., Gruber,C, Jessee,J. and Polayes f D. 
Full-length cDNA libraries and normalization 
Unpublished (2001) 
Contact: Genoscope 

Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - France 

Email: seqref @genoscope . ens . f r , Web : www.genoscope.cns.fr 
Library was constructed by Life Technologies, a division of 
Invitrogen. This sequence belongs to sequence cluster 2866. f 
Contact : Feng Liang Email : fliang@lifetech.com URL : 
http://fulllength.invitrogen.com/ InVitroGen Corporation 1600 
Faraday Avenue Genoscope sequence ID : CS0DC022BG06QP1 . 

Location/Qualifiers 

1. .1201 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone="CS0DC022YM12" 

/ t is sue_type= "NEUROBLASTOMA COT 25-NORMALIZED" 
/clone_lib="Homo sapiens NEUROBLASTOMA COT 2 5 -NORMAL I ZED" 
/note="lst strand cDNA was primed with a Notl-oligo (dT) 
primer. Five prime end enriched, double-strand cDNA was 
digested with Not I and cloned into the Not I and EcoR V 
sites of the pCMVSPORT 6 vector. Library was normalized." 



ORIGIN 



Query Match 2.9%; Score 46; DB 13; Length 1201; 

Best Local Similarity 11.5%; Pred. No. 0.11; 



Matches 79; Conservative 262; Mismatches 345; Indels 3; Gaps 2; 

Qy 825 GT GT AGAT G GAGAAGG CT C GGAGAGT GGGGGTGCTG G GG GCACAAAAT G GAAT GAAC ACT 884 

I : I : : : : I I I I : I I i : I I : I I I : : I I i i : : : I I : 

Db 1142 GARAAK RKARAG KAGKARGAAAAAG KAGAG KAGAG G GARRKAKAGAGT ARWG KTAAAG KW 1083 

Qy 885 GC T GAAGGAAT G CAG GGT T C AC T T CAAGAAGAAAG C AGT GT G C AGGT GT AC CAT CT CC CA 944 

: : I I : : : : I : I : I : I I I I I : I : : I : : : I :: : : 

Db 1082 RKARAAKKKRKADAADGGKAAKAGKWAGAAAGAGKAGGGKK^ 1023 

Qy 945 GT CAGAGAC C C AGTAAT CAGAGCAG C T AAT GGGAG GCAT G CT C CT T G GGT GGT GG C C AAC 1004 

:::::::: :::::::::::: : ::::::: | | : : : : : 
Db 1022 MNMMMMMMMMMM — MMMMMMMMMMMMMMMHKMHMHMKMKKHMMHKTMTNTKWMKTKTMMM 965 

Qy 1005 T T GT CAT T AT AC CT C C AAG GACAACAGAGT G GT AC ATAAG GCT AAAAC AGAGT T GT C AAC 1064 

Db 964 MMMMMMMVKKMMMKMMMMKMMMMMNYMMMTGM 905 

Qy 1065 CTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGA 1124 

: : : : I : I I : III: : I I I : : I : I : : I : : : : : : 

Db 904 MMKMMMGGKGGTMGGMGGGMMMVGMRGGGvMVIMGTGKK™^ 8 45 

Qy 1125 CCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTG 1184 

: I : : : : : I I : : I : i : : : : I : : : I I : I I I 

Db 844 KTMTMKMMMKHMTMTKGNGKTMHGKMMN^ 785 

Qy 1185 GAAC AT CAAAT CAT GC C AGC AGAAGT GGGACAGGCAAAT C CT CAAAGAT GT CT C CTT GT A 1244 

I I I I | | : | : : : : : | | : : | : : : : | : : : : : : : : 

Db 784 GAAAAAKAGAK-AGKHMNKNMGKTMMGTMMMGTGK^ 72 6 

Qy 1245 CAT CGAGAGTGGCCAGATTATGT GCAT CTT AGGCAGCTCAGGTAAGTGCCTGGGGGGSCS 1304 

Db 725 KGKMMMMKMGKMNMMAMMKMKMMTMTMTGTMMM™ 666 

Qy 1305 GGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGT 1364 

Db 665 MNANMKKNNKNAMNNNGKMNNMKKKTKKKNNMKKKNMKKNNNNMMNNKMNMMKMGP 606 

Qy 1365 TTAAGAAGTRAGTTTAAGTTGTAGAGAGGCAGCCATGCATTTGGCATTTGAATACAATCT 1424 

Db 605 KAKANNMKMKMNNNKMMMNNMKMKKKMK 546 

Qy 1425 GGTGACTTGTCTGGCTGCCAATAGAACCTAGTACCAAAGTGAAATCTTGAGGAAAATCCC 1484 

Db 54 5 K KMG KNN NMN AG K KMKMMMMMMNMNMNMMMN NMMNMNM K KN N N NMNM K KMNNMMNMMMAM 48 6 

Qy 14 85 T G GAAAG AG T G G AAAGT C C T G C C T AAC AC 1513 

: | : : : : : : | : : | : : 

Db 4 85 AMMMGNNNGKKMNMMMKMGNGMMMNANMM 457 



RESULT 22 
BY252099 

LOCUS BY252099 432 bp mRNA linear EST 10-DEC-2002 

DEFINITION BY252099 RIKEN full-length enriched, visual cortex Mus musculus 

cDNA clone K230342H21 5 1 , mRNA sequence. 
ACCESSION BY252099 



VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



BY252 099. 1 GI: 26433611 
EST . 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 432) 

Okazaki,Y., Furuno,M., Kasukawa,T., Adachi,J., Bono,H., Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa,Y. f Nogami,A., 
Schonbach, C. , Gojobori,T., Baldarelli, R. , Hill, D. P., Bult,c, 
Hume, D. A., Quackenbush, J. , Schriml, L.M. , Kanapin,A., Matsuda,H., 
Batalov, S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V. , 
Chothia,C, Corbani , L . E . , Cousins, S., Dalla,E., Dragani, T . A. , 
Fletcher, C. F. , Forrest, A. , Frazer, K. S . , Gaasterland, T . , 
Gariboldi,M. , Gissi,C, Godzik,A., Gough,J., Grimmond,S., 
Gustincich, S . , Hirokawa,N., Jackson, I . J. , Jarvis,E.D., Kanai,A. , 
Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , King,B.L., Konagaya,A., 
Kurochkin, I .V. , Lee,Y., Lenhard,B., Lyons, P. A., Maglott, D.R. , 
Maltais,L., Marchionni, L . , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky,N. , Pillai,R., Pontius , J. U . , Qi, D . , Ramachandran, S . , 
Ravasi,T., Reed, J . C - , Reed, D. J., Reid,J., Ring,B.Z., Ringwald,M. , 
Sandelin,A., Schneider, C. , Semple,C.A. , Setou,M., Shimada,K., 
Sultana, R., Takenaka,Y., Taylor, M.S., Teasdale, R. D . , Tomita,M., 
Verardo,R., Wagner, L., Wahlestedt , C . , Wang,Y., Watanabe,Y., 
Wells, C, Wilming, L. G. , Wynshaw-Boris , A. , Yanagisawa,M. , Yang, I., 
Yang,L., Yuan,Z., Zavolan,M., Zhu,Y., Zimmer,A. , Carninci,P., 
Hayatsu,N. , Hirozane-Kishikawa, T . , Konno,H. , Nakamura,M. , 
Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K., 
Arakawa,T., Fukuda,S., Hara,A., Hashizume, W . , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,I., Miyazaki,A. , Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa, A. , Yasunishi, A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J., Birney,E. and Hayashizaki , Y . 

Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

Nature 420, 563-573 (2002) 

22354683 

12466851 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC), Yokohama Institute 
The Institute of Physical and Chemical Research (RIKEN) 
1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 
Tel: 81-45-503-9222 
Fax: 81-45-503-9216 
Email : genome- res @gsc. riken. go. jp, 
URL : http : / / genome . gsc . riken . go . jp/ 
Aizawa,K., Akimura,T., Arakawa,T., 
Hirozane,T., Imotani, K. , Ishii,Y., 
Miyazaki,A. , Murata,M. , Nakamura,M. 

Ohno,M., Sakai,K., Sakazume,N., Sasaki, D., Sato,K., Shibata,K., 
Shiraki,T., Tagami,M., Waki,K., Watahiki,A., Muramatsu,M. and 
Hayashizaki, Y. Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 



Carninci, P . , Fukuda, S . , 
Itoh,M. , Kawai, J. , Konno, H . , 
Nomura, K. , Numazaki, R. , 



prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Tissues were provided by Michela Fagiolini and Takao K. Hensch ( 
Laboratory for Neuronal Circuit Development Brain Science Institute 
RIKEN 2-1 Hirosawa, Wako-shi, Saitama 351-0198 Japan ) whose 
assistance we gratefully acknowledge. Please visit our web site 
(http://genome.gsc.riken.go.jp) for further details. 
FEATURES Location/Qualifiers 
source 1. .432 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL/6J" 
/db_xref="taxon: 10090" 
/clone="K230342H21 M 
/tissue_type="visual cortex" 

/clone__lib="RIKEN full-length enriched, visual cortex" 

ORIGIN 



Query Match 2.8%; Score 43.8; DB 13; Length 432; 

Best Local Similarity 48.4%; Pred. No. 0.26; 

Matches 120; Conservative 0; Mismatches 128; Indels 0; Gaps 0; 

Qy 544 TCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTC 603 

I I II I I I I III I I I I I i I I II I I! I I I I I I 

Db 2 0 TCCTGCAGCTTCTCTTCCAGTTCAGCGGGTCAGGGCTGGTCGCGGAATCGCTCGGCTGTC 79 

Qy 604 TGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGA 663 

I I I I II I I I I I I I I I I I I I I II I I I II 

Db 8 0 ACCTCGCCAGCTCGTTGCCGGTGCGCTCGGGGNGGGCCGTCCGTCCGTCTGTACGGAGGA 139 



Qy 664 TCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGC 723 

I I I I I I I II I I I I II I I I I I I I I I I II 

Db 14 0 TGAGGCTGGAGTGGGCGGAGGCGTGAGAACCGAGTTACTTTCCTCCCGAGGTGGAGCCGG 199 

Qy 724 AGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGA 783 

III I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 200 AGGATTGAACTTCACCCTGAAACCACCCTCCGCCGGTCCCACCTGGCCGCCTTTACGTAA 259 



Qy 784 TTTCTGCT 791 

III II 

Db 260 CCTCTCCT 267 



RESULT 23 
CB424734/c 

LOCUS CB424734 



294 bp mRNA linear EST 25-MAR-2003 



DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



mRNA sequence. 



FEATURES 

source 



Echternkamp, S . E. , Chitko-McKown, C. G. , 
from pooled-tissue normalized libraries 



599034 MARC 6BOV Bos taurus cDNA 3' 
CB424734 

CB424734. 1 GI:29195073 
EST. 

Bos taurus (cow) 
Bos taurus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Cetartiodactyla; Ruminantia; Pecora; Bovoidea; 
Bovidae; Bovinae; Bos. 
1 (bases 1 to 294) 
Smith,T.P.L. , Roberts, A. J. , 
Wray, J.E. and Keele,J.W. 
A second set of bovine ESTs 
Unpublished (2003) 
Contact: Smith TPL 

USDA, ARS , US Meat Animal Research Center 
PO Box 166, Clay Center, NE 68933-0166, USA 
Tel: 402 762 4366 
Fax: 402 762 4390 

Email: smith@email.marc.usda.gov 

Single pass sequencing. Bases called with phred vO . 020425. c and 

trimmed with the aid of the trim_alt option. Vector identified with 

crossmatch vO. 990329. 

Plate: FQY8007 row: I column: 15 

Seq primer: TAGAAGGCACAGTCGAGG. 

Location/Qualifiers 

1. .294 

/organism="Bos taurus" 
/mol_type="mRNA" 
/db_xref="taxon: 9913" 
/tissue_type= "pooled" 
/lab_host="DH10B" 
/clone_lib="MARC 6BOV" 

/note="Vector : pcDNA3.1; Site_l: EcoRI; Site_2: NotI; 
Library made with RNA pooled from multiple tissues 
including liver, lung, hypothalamus, pituitary, and 
placenta/endometrium. " 



ORIGIN 



Query Match 2.7%; 
Best Local Similarity 49.8%; 
Matches 106; Conservative 



Score 43; DB 14; Length 294; 
Pred. No. 0.35; 
1; Mismatches 106; Indels 



0; Gaps 



0; 



Qy 1343 TGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTAAGTTGTAGAGAGGCAGCCATGC 14 02 

I M I I : I I I I I I I I I I I I I III I I Mill III 

Db 268 T G GC T GT GAT T T AACAAAAT GAT T AAAGT GT T AC CT ACAT GT GT AGC C GAAGT AGT GT GC 209 

Qy 14 03 ATTTGGCATTTGAATACAATCTGGTGACTTGTCTGGCTGCCAATAGAACCTAGTACCAAA 14 62 

III II I I I I I I I I I I I I I I I I I I I I I I 

Db 208 AGT GAG GT GT T T CT GAAT AC AT G GT C AGAT T T T T GGAAAAAAACAAAAAC AAAAAAAAC A 149 

Qy 1463 GT GAAAT CTT GAGGAAAAT CCCT GGAAAGAGT GGAAAGT C CT GCCTAACAC GTAAGTGCC 1522 

Ml I I I I II I I I I I I I M II II II I I I I 

Db 148 AGT AAAGT T CAACAAC CAT C CAAC GAGAAAAT T GCAAGTAGT GT GAC AGAGC T GAT T GAT 89 



QY 



1523 TTCTTTGCTTGTTTGATTGACTGTGATGCTAGA 1555 
II I I II I I I II I II II I III 



Db 



8 8 TTTGTTGCTTTCTTGATTTTTTTTTTTTTCAGA 56 



RESULT 24 

CNS005TE 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



CNS005TE 997 bp DNA linear GSS 03-JUN-1999 

Drosophila melanogaster genome survey sequence TET3 end of BAC # 
BACR12K22 of RPCI-98 library from Drosophila melanogaster (fruit 
fly) , genomic survey sequence. 
AL060767 

AL060767.1 GI:4943573 
GSS. 

Drosophila melanogaster (fruit fly) 
Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; 

Ephydroidea; Drosophilidae; Drosophila. 

1 (bases 1 to 997) 

Genoscope . 

Direct Submission 

Submitted ( 02- JUN-1999 ) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of a 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2 ; cn bw sp, the same strain used for the BDGP 1 s 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/ Qualifiers 

1. .997 

/organism="Drosophila melanogaster" 

/mo l_type=" genomic DNA" 

/db_xref="taxon:7227" 

/clone="BACR12K22" 

/clone_lib="RPCI-98 M 

/note-"end : TET3" 



ORIGIN 



Query Match 2.7%; Score 42.4; DB 29; Length 997; 

Best Local Similarity 25.2%; Pred. No. 1.1; 

Matches 53; Conservative 70; Mismatches 87; Indels 0; Gaps 0; 

Qy 22 CCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGC 81 

I : I : : I I : I : I : : : : : I I : : I : I : I : : I I : I : : : I : : : 
Db 772 CYC YYYCC YYYYC YTC YT YYYYYCT YYYYTYT YTT YT YYC YTYT YTTCT YYYT YT YYYC Y 831 



Qy 



82 CCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGC 141 



Db 8 32 CYCYYCTYCCTCYTYYYCTYCYYYYYCYYYYTCYTYTTMTYTYYYTYTYTYTYTYTHYTT 8 91 

Qy 142 ACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCTYTCTGGC7^AACACTTCCT 201 

I | | : : | M : : : I I : : : : : : : I I : I I I I I : I : I I 

Db 8 92 YTTTTYYCCYCYCCCTSYCYCTYCTYYCTYYYYTYYYTTTYTYTCTCTYYTCTYTTYTCT 951 

Qy 202 ATAAACACACCGTGTGTTCTGCCTATTGTC 231 

I : : : I I : I I : I :: : I :: 
Db 952 YTCYTYYYYTYYTYTCYTCYYCYYYYTCYY 981 



RESULT 25 
AA543856/C 

LOCUS AA543856 629 bp mRNA linear EST 01-AUG-1997 

DEFINITION vk34a07.rl Soares_mammary_gland_NbMMG Mus musculus cDNA clone 

IMAGE: 948468 5', mRNA sequence. 
ACCESSION AA543856 
VERSION AA543856.1 GI: 2292333 

KEYWORDS EST. 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 629) 

AUTHORS Marra,M., Hillier,L., Allen, M., Bowles, M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le,M., Martin, J., Morris, M. , 
Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 
TITLE The WashU-HHMI Mouse EST Project 

JOURNAL Unpublished (1996) 
COMMENT Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 
Washington University School of MedicineP 
4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 
Tel: 314 286 1800 
Fax: 314 286 1810 
Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI : 545324 

Seq primer: -28ml3 rev2 ET from Amersham 
High quality sequence stop: 459. 
FEATURES Location/Qualifiers 
source 1. .629 

/organism="Mus musculus" 

/mol_type= "mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone=" IMAGE: 948468" 

/sex="male" 

/tissue_type= "mammary gland" 
/dev_stage="4 weeks" 
/lab_host="DH10B" 

/clone__lib="Soares_mammary_gland_NbMMG" 
/note="0rgan: mammary gland; Vector: pT7T3D-Pac 



(Pharmacia) with a modified polylinker; Site_l: Not I; 
Site_2 : Eco RI ; 1st strand cDNA was primed with a Not I - 
oligo(dT) primer [5* 

TGTTACCAATCTGAAGTGGGAGCGGCCGCGAATGGTTTTTTTTTTTTTTTTTTTTTTT 
T 3 1 ]; double-stranded cDNA was ligated to Eco RI 
adaptors (Pharmacia), digested with Not I and cloned into 
the Not I and Eco RI sites of the modified pT7T3 vector. 
RNA provided by Dr. Minoru Ko, Wayne State Univ. Library 
constructed and normalized by Bento Soares and M.Fatima 
Bonaldo. " 

ORIGIN 

Query Match 2.6%; Score 4 0.6; DB 9; Length 629; 

Best Local Similarity 55.2%; Pred. No. 2.9; 



Matches 79; Conservative 0; Mismatches 64; Indels 0; Gaps 0; 

Qy 4 08 TGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCT 4 67 

III I I I I I I I I I II I II I III I I I I I I I I I 

Db 580 T C CAT T GT C CT AG C C AG GAAGT G C CT AGC CT T AGACAGAC AC AGT GGAGT CT GAGT C AC A 521 

Qy 468 CACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCT 527 

II III II I I I I I I I I I I II M I I I ! I II I I I I I I I 

Db 52 0 C AGT C CAT C T C AG C C T CT CT GAG C TT C CT GAGACAT GGAT C GAGAC AG G GT AC G GC GC AG 4 61 

Qy 52 8 CGGCACAGCTTAGGTGTCCTGCA 550 

I I I I I I I I I I I III 
Db 4 60 GGGCCCGGGTTTGCTGACTGGCA 43 8 



RESULT 2 6 

CB417759 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



mRNA linear 
, mRNA sequence. 



EST 25-MAR-2003 



CB417759 294 bp 

590490 MARC 6BOV Bos taurus cDNA 5' 
CB417759 

CB417759. 1 GI: 29181135 
EST. 

Bos taurus ( cow) 
Bos taurus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Cetartiodactyla; Ruminantia; Pecora; Bovoidea; 
Bovidae; Bovinae; Bos. 
1 (bases 1 to 294) 

Smith, T . P . L. , Roberts, A. J. , Echternkamp, S . E. , Chitko-McKown, C. G. , 
Wray,J.E. and Keele,J.W. 

A second set of bovine ESTs from pooled-tissue normalized libraries 
Unpublished (2003) 
Contact: Smith TPL 

USDA, ARS , US Meat Animal Research Center 
PO Box 166, Clay Center, NE 68933-0166, USA 
Tel: 402 762 4366 
Fax: 402 762 4390 

Email : smith@email . marc . usda . gov 

Single pass sequencing. Bases called with phred vO. 020425. c and 

trimmed with the aid of the trim_alt option. Vector identified with 

crossmatch vO. 990329. 

Plate: FQY8007 row: I column: 15 

Seq primer: GTAATACGACTCACTATAGGG. 



FEATURES 

source 



Location/ Qualifiers 
1. .294 

/organism="Bos taurus" 
/mo l_t yp e = "rnRNA" 
/db_xref="taxon:9913" 
/tissue_type= "pooled" 
/lab_host="DH10B M 
/clone_lib="MARC 6B0V" 

/note="Vector : pcDNA3.1; Site_l: EcoRI; Site_2 : Not I; 
Library made with RNA pooled from multiple tissues 
including liver, lung, hypothalamus, pituitary, and 
placenta/ endometrium. " 



ORIGIN 



Query Match 2.5%; Score 39.8; DB 14; Length 294; 

Best Local Similarity 48.8%; Pred. No. 3.2; 

Matches 104; Conservative 1; Mismatches 108; Indels 0; 



Gaps 



0; 



Qy 1343 TGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTAAGTTGTAGAGAGGCAGCCATGC 14 02 

I I I I : I I I I I I I I I I I I I Ml I I I I I I I III 
Db 27 T GGGT GT GAT T T AACAAAAT GAT TAAAGT GTT AC CT ACAT GT GTAG C C GAAGT AGT GT G C 8 6 

Qy 14 03 ATTTGGCATTTGAATACAATCTGGTGACTTGTCTGGCTGCCAATAGAACCTAGTACCAAA 14 62 

III II I I I I I I I I I I I I I I I I I I I I I I 

Db 87 AGT GAGGT GTTT CT GAATACAT GGT CAGATTT TT GGAAAAAAACAAAAACAAAAAAAACA 146 



Qy 

Db 

Qy 

Db 



1463 GT GAAAT CT T GAG GAAAAT C C CT GGAAAGAGT G GAAAGT C CT GC CT AAC AC GTAAGT GC C 1522 
III I I I I I I I I I I I I I I I I II II I I I I 

147 AGTAAAGTTCAACAACCATCCAACGAGAAAATTGCAAGGAGTGTGACAGAGCTGATTGAT 2 06 

1523 TTCTTTGCTTGTTTGATTGACTGTGATGCTAGA 1555 
II I I I I I I I I I I I I II I III 
207 TTTGTTGCTTTCTTGATTTTTTTTTTTTTCAGA 239 



RESULT 2 7 

AQ977239/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



AQ977239 455 bp DNA linear GSS 29-JAN-2000 

RPCI-23-319D17 .TJ RPCI-23 Mus musculus genomic clone 
RPCI-23-319D17, genomic survey sequence. 
AQ977239 

AQ977239.1 GI:6809540 
GSS. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
1 (bases 1 to 455) 

Zhao,S., Nierman,W., Feldblyum, T . , Malek,J., Shatsman,S., 
Akinret,B., Levins, M. , Mcgann,S., Tsegaye,G., Geer,K., Krol,M., de 
Jong, P. and Fraser,C.M. 

Mouse BAC End Sequences from Library RPCI-23 

Unpublished (1999) 

Contact: Shaying Zhao 

Department of Eukaryotic Genomics 

The Institute for Genomic Research 

9712 Medical Center Dr., Rockville, MD 20850, USA 



FEATURES 

source 



Tel: 301 838 0200 
Fax: 301 838 0208 
Email: szhao@tigr.org 

Clones are derived from the mouse BAC library RPCI-23. For BAC 
library availability, please contact Pieter de Jong 
(pieter@dejong.med.buffalo.edu) . Clones may be purchased from 
BACPAC Resources (http : //bacpac . med . buf f alo . edu/orderingf rame . htm) 
or from Resea ch Genetics (info@resgen.com). BAC end page: 
http : //www. tigr . org/tdb/bac_ends/mouse/bac_end_intro . html 
Plate : 319 row: D column : 17 
Seq primer: SP6 
Class: BAC ends. 

Location/ Qualifiers 

1. .455 

/organism="Mus musculus" 
/mol_type=" genomic DNA" 
/strain="C57BL/6J" 
/db_xref="taxon: 10090" 
/clone="RPCI-23-319D17" 
/sex=" Female" 
/lab_host="DH10B" 
/clone_lib="RPCI-23" 

/note="Organ: Kidney/Brain; Vector: pBACe3.6; Site_l: 
EcoRI; Site_2: EcoRI; Female C57BL/6J mouse kidney and/or 
brain genomic DNA was isolated and partially digested 
with a combination of EcoRI and EcoRI Methylase. Size 
selected DNA was cloned into the pBACe3.6 vector at the 
EcoRI sites. The ligation products were transformed into 
DH10B electrocompetent cells (BRL Life Technologies) . " 



ORIGIN 



Query Match 2.5%; Score 39.8; DB 28; Length 455; 

Best Local Similarity 51.4%; Pred. No. 4.2; 

Matches 89; Conservative 1; Mismatches 83; Indels 0; 



Gaps 



0; 



Qy 

Db 



4 6 TTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGT 105 
I I I I I I III II I I I I I I I I I I I I I I I III 

22 0 TGTTGTTGTTGTAGCTGTAATTGGTGGTTTTAGTGGTAACGTCTCTCCCTCTCCCTCTCT 161 



Qy 

Db 



106 TGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGC 165 

I I I I I I I II I I I I I II III I I I I I I I I I 

160 TCCTCCCTCTCCTCCTCCTCTTCCTCGTCCTCTTCCTCTTCCTTCTTCCTCTCTCTCTCT 101 



Qy 



Db 



166 CTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGT 
I I I I I I I I 11:111 I I I I I I III I I I I I I I I I I 

100 CTCTCTCTCTCTCTCTCTCTCTCACACACACACTCTCTCTTCCCCCCGTGTGT 



218 



48 



RESULT 2 8 

BX399635/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 



BX399635 1201 bp mRNA linear EST 13-MAY-2003 

BX399635 Homo sapiens PLACENTA COT 25-NORMALIZED Homo sapiens cDNA 
clone CS0DI079YD20 3-PRIME, mRNA sequence. 
BX399635 

BX399635.1 GI:30621940 
EST. 

Homo sapiens (human) 



ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 1201) 

Li,W.B., Gruber,C, Jessee,J. and Polayes,D. 
Full-length cDNA libraries and normalization 
Unpublished (2001) 
Contact: Genoscope 

Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - France 

Email: seqref@genoscope.cns.fr, Web : www.genoscope.cns.fr 
Library was constructed by Life Technologies,, a division of 
Invitrogen. This sequence belongs to sequence cluster 4412. f For 
more information about this cluster, see 
http : / /www. genoscope . ens . f r/ 

cgi-bin/cluster . cgi?seq=CS0DI079DB10NPl&cluster=4412 . f . Contact : 
Feng Liang Email : f liang@lif etech . com URL : 

http://fulllength.invitrogen.com/ InVitroGen Corporation 1600 
Faraday Avenue Genoscope sequence ID : CS0DI079DB10NP1 . 

Location/Qualifiers 

1. .1201 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone="CS0DI079YD20" 

/tissue_type="PLACENTA COT 25-NORMALIZED" 
/clone_lib="Homo sapiens PLACENTA COT 25-NORMALIZED" 
/note="lst strand cDNA was primed with a Notl-oligo (dT) 
primer. Five prime end enriched, double-strand cDNA was 
digested with Not I and cloned into the Not I and EcoR V 
sites of the pCMVSPORT 6 vector. Library was normalized." 



ORIGIN 



Query Match 2.5%; Score 39.6; DB 13; Length 1201; 

Best Local Similarity 34.0%; Pred. No. 8.8; 

Matches 54; Conservative 41; Mismatches 64; Indels 0; 



Gaps 



0; 



Qy 

Db 

Qy 

Db 

Qy 

Db 



4 2 CTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTG 101 
: I I I : I I I I I : : : I I I : | :: | :: | : | ::::::: | ::: | :: | : 

928 STCGTTKTGTTTGGCCSSSYCTGGGKTTTTGKGGYKGKKCSCKKYKYTYKGBKBYCYYTK 869 

102 CTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGC 161 

: : I I I : I I I I I : I : I I I : I I : I I I : I I I : I 

868 SBTTTTTGYTCTCCCCCYCYGCCCTCYTCTTAAAAAAAHWMNCTGMTAATTTTTCYCTCT 809 

162 CAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCC 200 

I I : I : : I : I :: I I : I I I I I : I I I 
8 08 AAAWCYATSCTMCMCCCYYCCCCMCTCGCCCCCCYTTTC 77 0 



RESULT 29 

BB612448/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 



BB612448 662 bp mRNA linear EST 31-AUG-2001 

BB612448 RIKEN full-length enriched, 0 day neonate skin Mus 
musculus cDNA clone 4 632410H18 5', mRNA sequence. 
BB612448 
BB612448.1 



GI:15394826 



KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 662) 

Arakawa,T., Carninci,P., Fukuda,S., Furuno,M. , Hanagaki,T., 
Hara,A., Hiramoto,K., Hori,F., Ishii,Y., Ito,M., Kawai,J., 
Konno,H., Kouda,M., Koya,S., Matsuyama, T . , Miyazaki, A. , Nomura, K. , 
Ohno,M., Okazaki,Y., Okido,T., Saito,R., Sakai,C, Sakai,K., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Suzuki, H. , Tagami,M., Tagawa,A. , Takahashi, F. , 
Takeda,Y., Tanaka,T., Toya / T. / Muramatsu,M. and Hayashizaki, Y. 
RIKEN Mouse ESTs (Arakawa,T., et al. 2001) 
Unpublished (2001) 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc . riken. go . jp, 

URL: http : / / genome . gsc. riken. go . jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura,S., Kawai,J., Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y . 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi, Y. , Shibata,K., Itoh,M., Carninci,P., 
Sugahara,Y. and Hayashizaki, Y. 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Yamanaka,I., Kiyosawa,H., Kondo,S., Saito,T., Shinagawa, A. , 
Aizawa,K., Fukuda,S., Hara,A. , Itoh,M., Kawai,J., Shibata,K., 
Arakawa,T., Ishii,Y. and Hayashizaki, Y. 

Mapping of 19032 mouse cDNAs on mouse chromosomes. J. Struct. 
Func. Genomics 2 pre, L72-L86 (2001 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details . 

e mouse tissues. 

Location/Qualifiers 
1. .662 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain= f, C57BL/6J" 

/db_xref="taxon: 10090" 

/clone= ,, 4632410H18" 

/sex="mixed" 



/ tissue_type="skin" 
/dev__stage="0 day neonate" 
/lab__host="DH10B" 

/clone__lib="RIKEN full-length enriched, 0 day neonate 
skin" 

/note="Site_l : Sail; Site_2 : BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 f 

GAGAGAGAGAAG GAT C CAAGAGCT CTTTTTTTTTTTT T TT T VN 3 ' ] , cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. cDNA went through one round of normalization 
to Rot = 10.0 and subtraction to Rot = 100.0. Second 
strand cDNA was prepared with the primer adapter of 
sequence [5 1 GAGAGAGAGAT T CT C GAGT T AAT T AAAT T AAT CCCCCCCCCCCCC 
3 1 ] . cDNA was cloned into the Xhol and BamHI sites. 
Vector: a modified pBluescript KS(+) after bulk excision 
from Lambda FLC I" 



ORIGIN 



Query Match 2.5%; Score 39.4; DB 10; Length 662; 

Best Local Similarity 51.4%; Pred. No. 7; 

Matches 91; Conservative 0; Mismatches 86; Indels 0; Gaps 0; 

Qy 1388 GAGAGGCAGCCATGCATTTGGCATTTGAATACAATCTGGTGACTTGTCTGGCTGCCAATA 14 47 

II I I I I I I I I I I III I II I III III I II 

Db 4 66 GAGT GGCACCCATGGTTCTTGGGGTCCT AAAT GAATGCTT AAT TCTTCTTAATTAAAAAG 4 07 

Qy 144 8 GAACCT AGT AC CAAAGT GAAAT CT T G AGGAAAAT C C C T GGAAAGAGT G GAAAGT C C T GC C 1507 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 406 GAAGC T GGAGGAAGAGTAGT GT AGT GAGGAGAAG C CT AGAGAAG GGT T CT GT GT GCT GAC 347 

Qy 1508 TAACACGTAAGTGCCTTCTTTGCTTGTTTGATTGACTGTGATGCTAGAGAGCAAACC 1564 

III I I I I I I I I I I I I II II I I I I I I I I I I I 

Db 346 TGACTAATGGCTGCCATTTATTTTGGTATTGTTTTATGGCAGGCGAGGGACCAGAGC 290 



RESULT 30 

BX414498 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



BX414498 1141 bp mRNA linear EST 15-MAY-2003 

BX414498 Homo sapiens THYMUS Homo sapiens cDNA clone CS0CAP001YI15 
5-PRIME, mRNA sequence. 
BX414498 

BX4144 98 .1 GI : 307 69188 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 1141) 

Li,W.B., Gruber,C, Jessee,J. and Polayes,D. 
Full-length cDNA libraries and normalization 
Unpublished (2001) 



COMMENT Contact : Genos cope 

Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - France 

Email: seqref@genoscope.cns.fr, Web : www.genoscope.cns.fr 
Library was constructed by Life Technologies, a division of 
Invitrogen. This sequence belongs to sequence cluster 1974. f For 
more information about this cluster, see 
http : / /www. genoscope . ens . f r/ 

cgi-bin/cluster . cgi?seq=CSOCAP001AE08QPl&cluster=1974 . f . Contact : 
Feng Liang Email : fliang@lifetech.com URL : 

http://fulllength.invitrogen.com/ InVitroGen Corporation 1600 
Faraday Avenue Genoscope sequence ID : CSOCAP001AE08QP1 . 
FEATURES Location/Qualifiers 
source 1 . . 1141 

/organism="Homo sapiens" 

/mol_type="mRNA" 

/db_xref="taxon: 9606" 

/clone="CS0CAP001YI15" 

/tissue_type="THYMUS" 

/clone_lib="Homo sapiens THYMUS" 

/note="Vector : pCMVSP0RT_6; 1st strand cDNA was primed 
with a Notl-oligo (dT) primer. Five prime end enriched, 
double-strand cDNA was digested with Not I and cloned into 
the Not I and EcoRV sites of the pCMVSPORT 6 vector. 
Library was not normalized." 

ORIGIN 

Query Match 2.5%; Score 39.4; DB 13; Length 1141; 

Best Local Similarity 34.1%; Pred. No. 9.8; 

Matches 98; Conservative 46; Mismatches 143; Indels 0; Gaps 0; 

Qy 1193 AAT CAT G C C AG C AGAAGT G G GAC AG GCAAAT C CT CAAAGAT GT CT C CT T GT AC AT C GAGA 1252 

II III I I : I : I I : I I I I : I I I : III I : : .: : I 

Db 26 AAAAAAGCAGGCWGGWACCGGWCCGGAATWCCCGGGAWATCGTCGACSSASGSGDSSGGS 85 

Qy 1253 GTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCC 1312 

I : I I I III I I I I : I I I I I I II : II : : I I I 

Db 86 GGSGCGGCAGGAAGGGACGGCAGTCDCGCGCGGKGAGGAGCCGGGGKGGGGGAGCGGCKC 145 

Qy 1313 TGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAG 1372 

: I : I : I I I I I I I I : I I : I : : I I I I I I I : : I I 

Db 14 6 GKGGAGGCKACKGCAGCACKGGGGKGKCAGTKGTKGGKCCGACCCAGAACGCKKCAGKKC 2 05 

Qy 1373 T RAGT T TAAGT T GTAG AGAG G C AGC C AT GCAT T TG GCAT TT GAAT ACAAT CT GGT GAC T T 1432 

: : : : I : I : I : I I I I : : : : I I : : : I I : I I I : : I 

Db 2 06 KGCKCKGCAAGGAKAKAKAAGAACKGAKKGGKGKGCCCGKKKAAKAAAAGAAPCAKGGAAA 265 

Qy 14 33 GT CT GGCT GCCAAT AGAAC CT AGT ACCAAAGT GAAAT CT T GAGGAAA 14 79 

: I I MM: I I : M I I : : I : I I 

Db 266 C KGAAC AG C CAGAAGAAAC C K KC C C GAAC AC K GAAAC S AAKG GK GAA 312 



RESULT 31 
BI028780 

LOCUS BI028780 361 bp mRNA linear EST 14-JUN-2001 

DEFINITION CM0-MT0180-230201-789-g06 MT0180 Homo sapiens cDNA, mRNA sequence. 

ACCESSION BI028780 



VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



FEATURES 

source 



BI028780. 1 GI:14435410 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 361) 

Dias Neto,E., Garcia Correa,R., Ver j ovski-Almeida, S . , Briones ,M. R. , 
Nagai,M.A., da Silva,W. Jr., Zago,M.A. , Bordin,S., Costa, F.F., 
Goldman, G.H. , Carvalho, A. F. , Matsukuma,A. , Baia,G.S., Simpson, D . H . , 
Brunstein, A. , deOliveira, P. S . , Bucher, P. , Jongeneel, C. V. , 
0 ! Hare,M.J., Soares,F., Brentani, R. R. , Reis , L . F. , de Souza,S.J. and 
Simpson, A. J. 

Shotgun sequencing of the human trans criptome with ORF expressed 
sequence tags 

Proc. Natl. Acad. Sci. U.S.A. 97 (7), 3491-3496 (2000) 

20202663 

10737800 

Contact: Simpson A.J.G. 
Laboratory of Cancer Genetics 
Ludwig Institute for Cancer Research 

Rua Prof. Antonio Prudente 109, 4 andar, 01509-010, Sao Paulo-SP, 
Brazil 

Tel: +55-11-2704922 
Fax: +55-11-2707001 
Email: asimpson@ludwig.org.br 

This sequence was derived from the FAPESP/LICR Human Cancer Genome 

Project. This entry can be seen in the following URL 

(http: //www. ludwig. org.br/scripts/gethtml2 .pi ?tl=CM0& t2=CM0-MT0 18 0- 

2302 01-7 8 9-g06&t3=2001-02-23&t4=l) 

Seq primer: puc 18 forward 

High quality sequence start: 21 

High quality sequence stop: 73. 

Location/Qualifiers 

1. .361 

/organism="Homo sapiens" 
/mo 1_ t yp e = "mRNA " 
/db_xref= M taxon:9606" 
/dev_stage="Adult" 
/clone_lib="MT0180" 

/note= n Organ: marrow; Vector: pucl8; Site_l: Smal; Site_2: 
Smal; A mini-library was made by cloning products derived 
from ORESTES PCR (U.S. Letters Patent application No. 
196,716 - Ludwig Institute for Cancer Research) profiles 
into the pUC 18 vector. Reverse transcription of tissue 
mRNA and cDNA amplification were performed under low 
stringency conditions." 



ORIGIN 



Query Match 2.5%; Score 39.2; DB 12; Length 361; 

Best Local Similarity 50.0%; Pred. No. 5.5; 

Matches 98; Conservative 0; Mismatches 98; Indels 0; Gaps 0; 

Qy 4 39 TTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGG 4 98 

I I I II I I II I I I I I I II I I I I I II I I I I I I II 

Db 154 T T C G GAGAG C AG G GT GAGAGAGAGAGAT AT G GAGAAAC AGT G C AC CAG C GAGAT G GAT GA 213 



Qy 4 99 AGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCT 55 8 

I I I I I I III III I I I I I I I I I I I III 

Db 214 GGGATGGGGGAGAGATGGGGACGGGGTGAGGGCACCCTGGAGGGGGACGCACAGGGCCAG 27 3 

Qy 559 ACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTG 618 

I I I I I I I I III I I I I II I I I I I II I II I 

Db 274 AGAGAGACAGGAGAGGCTGAACCAAGAGTCAAGCACACACATAGCTGTGTGTGGGTGCTG 333 

Qy 619 AATGGGTGGGTGGGCC 634 

I I I I I I I I III I 
Db 334 GATGGGTGCGGGGGAC 34 9 



RESULT 32 
CK203027/c 

LOCUS CK203027 837 bp mRNA linear EST 08-DEC-2003 

DEFINITION FGAS011553 Triticum aestivum FGAS : Library 3 Gate 6 Triticum 

aestivum cDNA, mRNA sequence. 
ACCESSION CK203027 

VERSION CK203027.1 GI: 39565417 

KEYWORDS EST. 

SOURCE Triticum aestivum (bread wheat) 

ORGANISM Triticum aestivum 

Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta ; Liliopsida; Poales; Poaceae; 
Pooideae; Triticeae; Triticum. 
REFERENCE 1 (bases 1 to 837) 

AUTHORS Allard,F., Crosby,W.L., Danyluk,J., Eudes,F., Frick,M. , Gaudet,D., 
Genswein,B., Graf,R., Gulick,P., Hrycan,L.D., Laroche,A., 
Links, M.G., McCarthy, E . L . , Monroy,A. , Muzak, I., Nilson,D., 
Penniket,C, Roach, J. L. and Sarhan,F. 
TITLE Functional Genomics of Abiotic Stress In Wheat and Canola Crops 

JOURNAL Unpublished (2003) 
COMMENT Contact: Wm L Crosby 

Bioinf ormatics 

University of Saskatchewan, Department of Computer Science 

1C101 Engineering Building, 57 Campus Drive, Saskatoon, 

Saskatchewan, S7N 5A9, Canada 

Tel: 306 966 1769 

Fax: 306 966 2033 

Email: f gas_ests@cs . usask . ca 

This sequence is the direct result of the Base calling software 
Phred (default parameters) . It is the raw base calls. To aid in the 
identification of the high quality insert the software Lucy 
(default parameters) has been run on this sequence. Lucy identified 
the region [102,331]. 
Plate: L3C116 row: J column: 21. 
FEATURES Location/Qualifiers 
source 1 . .837 

/organism="Triticum aestivum" 

/mol_type="mRNA" 

/db_xref="taxon:4565" 

/clone_lib="Triticum aestivum FGAS: Library 3 Gate 6" 
/note="Organ: Root; Vector: pCMV.SPORT6; Root tissue from 
control, cold-acclimated and salt stressed wheat cultivar 
Norstar. 7 mRNA populations were combined before 
constructing the library; 7 day non-acclimated roots, 1, 



23, and 53 days cold-acclimated at 4C, and 30 minutes, 3 
hours and 6 hours treated roots with 200mM NaCl . 
Non-acclimated and cold-acclimated plants were grown in 
vermiculite while salt stressed plant were grown 
hydroponically . First strand synthesis in this library was 
done in the presence of methylated dCTP thereby protecting 
from internal cleavage with NotI . " 



ORIGIN 



Query Match 2.5%; Score 39.2; DB 14; Length 837; 

Best Local Similarity 51.6%; Pred. No. 9.3; 

Matches 80; Conservative 3; Mismatches 72; Indels 0; Gaps 0; 

Qy 1231 GATGTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAG 1290 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 455 GAGGTGGCCGTGTTGCTGCTTGGTGGCCAGGGCAAATAAGAAGTAGGAGGGCCAGGGGAG 39 6 

Qy 1291 TGCCTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYG 1350 

I I I I M I I I : : I I III I II I I I II I I I I I I I I I 

Db 395 GGCCTGGGGTGCACTGCCCGGCGGGAAGAAGTTCGGAGGATTTGGGAGGCAGAAGGTCGG 33 6 

Qy 1351 T CTAAGCACAATGTTTAAGAAGT RAGTTTAAGTTG 1385 

II I I I I I I I I : I II I I III 

Db 335 TCGAAGAGGGAGGTGTCCGGTTGAAGTTTTGGTNG 301 



RESULT 33 

AA914287/C 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



AA914287 458 bp mRNA linear EST 14-APR-1998 

vy99b08.rl Soares_mammary_gland_NbMMG Mus musculus cDNA clone 
IMAGE: 1314327 5 ! similar to TR:015273 015273 TELETHONIN. ;, mRNA 
sequence . 
AA914287 

AA914287.1 GI:3053679 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 (bases 1 to 458) 
Marra,M., Hillier,L 
Geisel,S., Kucaba,T 



Craniata; Vertebrata; Euteleos tomi ; 
Sciurognathi; Muridae; Murinae; Mus. 



Allen, M., Bowles, M. , Dietrich, N., Dubuque, T. 
Lacy,M., Le,M., Martin, J., Morris, M. , 
Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 684623 



Seq primer: -2 8ml3 rev2 ET from Amersham 
High quality sequence stop: 449. 
FEATURES Location/Qualifiers 
source 1. .458 

/organism="Mus musculus" 
/mol_type= M mRNA" 
/strain="C57BL/6J M 
/db_xref="taxon: 10090" 
/clone= ,, IMAGE: 1314327" 
/sex="male" 

/tissue_type="mammary gland" 
/dev_stage="4 weeks" 
/lab_host= ,f DH10B" 

/ clone_lib= n Soares__mammary_gland_NbMMG" 
/note="Organ: mammary gland; Vector: pT7T3D-Pac 
(Pharmacia) with a modified polylinker; Site_l: Not I; 
Site_2: Eco RI; 1st strand cDNA was primed with a Not I - 
oligo(dT) primer [5 1 

TGTTACCAATCTGAAGTGGGAGCGGCCGCGAATGGTTTTTTTTTTTTTTTTTTTTTTT 
T 3']; double-stranded cDNA was ligated to Eco RI 
adaptors (Pharmacia) , digested with Not I and cloned into 
the Not I and Eco RI sites of the modified pT7T3 vector. 
RNA provided by Dr. Minoru Ko, Wayne State Univ. Library 
constructed and normalized by Bento Soares and M.Fatima 
Bonaldo . " 

ORIGIN 

Query Match 2.5%; Score 39; DB 9; Length 458; 

Best Local Similarity 54.5%; Pred. No. 7.3; 

Matches 78; Conservative 0; Mismatches 65; Indels 0; Gaps 0; 

Qy 408 TGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCT 4 67 

III I I I I I I I I I II III III I I I I I II I I 

Db 218 T C CAT T GT CC T AG C CAG GAAGT G C CT AGC CT GAGACAGACACAGT GGAGT CT GAGT C ACA 159 

Qy 4 68 CACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCT 527 

II III II I I I I I I I I I I II I I I I I I I I I I I I II I I 

Db 158 CAGT C CAT CT C AGC CT C T C T GAGCTT C CT GAGAC AT G GAT CGAGACAG G GT AC G G C GCAG 99 

Qy 528 CGGCACAGCTTAGGTGTCCTGCA 550 

I I I I I I I I I I I III 
Db 98 GGGCCCGGGTTTGCTGACTGGCA 76 



RESULT 34 

AA060852/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 



AA060852 493 bp 

mj86d02.rl Soares mouse p3NMF19 
IMAGE: 4 82 97 9 5', mRNA sequence. 
AA060852 

AA060852.1 GI:1554690 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodent ia; 
1 (bases 1 to 493) 



mRNA linear EST 23-SEP-1996 
Mus musculus cDNA clone 



Craniata; Vertebrata; Euteleos tomi ; 
Sciurognathi; Muridae; Murinae; Mus, 



AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



Marra,M., Hillier,L., Allen, M., Bowles, M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le,M., Martin, J., Morris, M., 
Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B. , 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI:293723 

Putative full length read 

vector to vector length is 512 

Seq primer: -28M13 rev2 from Amersham 

High quality sequence stop: 481. 

Location/Qualifiers 

1. .493 

/organism="Mus musculus" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/clones "IMAGE: 482979" 
/dev_stage="19 . 5 dpc total fetus" 
/lab_host="DH10B (ampicillin resistant)" 
/clone_lib="Soares mouse p3NMF19.5" 
/note="Vector: pT7T3D (Pharmacia) with a modified 
polylinker; Site_l: Not I; Site_2 : Eco RI ; 1st strand cDNA 
was primed with a Not I - oligo(dT) primer [5* 
TGTTACCAATCTGAAGTGGGAGCGGCCGCATTTTTTTTTTTTTTTTTTTT 3 1 ] , 
double-stranded cDNA was size selected, ligated to Eco RI 
adapters (Pharmacia), digested with Not I and cloned into 
the Not I and Eco RI sites of a modified pT7T3 vector 
(Pharmacia) . Library went through one round of 
normalization to a Cot = 5. Library constructed by Bento 
Soares and M.Fatima Bonaldo. RNA was kindly provided by 
Dr. Minoru Ko (Wayne State University)." 



ORIGIN 



Query Match 2.5%; Score 39; DB 9; Length 493; 

Best Local Similarity 54.5%; Pred. No. 7.6; 

Matches 78; Conservative 0; Mismatches 65; Indels 0; Gaps 0; 

Qy 4 08 TGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCT 467 

III I I I I I I I I I II III III I I I I I II I I 

Db 175 TCCATTGTCCTAGCCAGGAAGTGCCTAGCCTGAGACAGACACAGTGGAGTCTGAGTCACA 116 

Qy 4 68 CACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCT 527 

M Ml M I M I I I I I I I I II I I I I I I I I I I 

Db 115 CAGT C CAT CT C AGC C T C T C T GAG C T T C CT GAGACAT G GAT CGAGAC AG GGT AC G GC G C AG 56 



Qy 



52 8 CGGCACAGCTTAGGTGTCCTGCA 550 
I I I I I I I I II I III 



Db 



55 GGGCCCGGGTTTGCTGACTGGCA 33 



RESULT 35 

AA882149/C 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



AA882149 525 bp mRNA linear EST 26-MAR-1998 

vx38e02.rl Stratagene mouse lung 937302 Mus musculus cDNA clone 
IMAGE: 1277498 5' similar to TR:015273 015273 TELETHONIN. ;, mRNA 
sequence . 
AA882149 

AA88214 9. 1 GI : 2991260 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 525) 

Marra,M., Hillier,L., Allen, M. , Bowles, M. , Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M. , Le,M., Martin, J. , Morris, M. , 
Schellenberg, K. , Steptoe,M. , Tan,F., Underwood, K . , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 669298 

Seq primer: -28ml3 revl ET from Amersham 
High quality sequence stop: 493. 

Location/Qualifiers 

1. .525 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6 x CBA" 

/db_xref="taxon: 10090" 

/clone="IMAGE: 1277498" 

/ sex="f emale" 

/ tissue_type="lung" 

/dev_stage="6-8 month old" 

/lab_host="SOLR (kanamycin resistant)" 

/clone_lib="Stratagene mouse lung 937302" 

/note="0rgan: lung; Vector: pBluescript SK-; Site_l : 

EcoRI; Site_2: Xhol; Cloned unidirectionally . Primer: 

01 igo dT . 6-8 month old female lung and 1.5 year old male 

lung were source of mRNA. Average insert size: 1.5 kb; 

Uni-ZAP XR Vector; ~5* adaptor sequence: 5' GAATT CGGCACGAG 

3 1 ~3 f adaptor sequence: 5' CTCGAGTTTTTTTTTTTTTTTTTT 3'" 



ORIGIN 



Query Match 



2.5%; Score 39; DB 9; Length 525; 



Best Local Similarity 54.5%; Pred. No. 7.9; 

Matches 78; Conservative 0; Mismatches 65; Indels 0; Gaps 0; 



QY 4 08 TGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCT 4 67 

I I I I I I I I I I I I M I M | | | I I I I I I I I | 

Db 263 T C CAT T GT C CT AGC C AG GAAGT GC C TAG C C T GAGACAGAC ACAGT G GAGT C T GAGT C AC A 2 04 

QY 468 CACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCT 527 

M I I I M I I I I I I I I I I I II I II II III I I I I I I | 

Db 203 CAGT C CAT CT CAGC CT CT CT GAGCTT CCT GAGACAT GGAT C GAGACAGGGT ACGGC GCAG 144 

Qy 528 CGGCACAGCTTAGGTGTCCTGCA 550 

I I I I I I I I I I I Ml 
Db 143 GGGCCCGGGTTTGCTGACTGGCA 121 



RESULT 3 6 
BI416074/C 

LOCUS BI416074 856 bp mRNA linear EST 14-AUG-2001 

DEFINITION 602987346F1 NCI_CGAP_Lu33 Mus musculus cDNA clone IMAGE : 5143451 5', 

mRNA sequence. 
ACCESSION BI416074 

VERSION BI416074.1 GI: 15176984 

KEYWORDS EST. 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 856) 

AUTHORS NIH-MGC http://mgc.nci.nih.gov/. 

TITLE National Institutes of Health, Mammalian Gene Collection (MGC) 

JOURNAL Unpublished (1999) 
COMMENT Contact: Robert Strausberg, Ph.D. 

Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: Gilbert Smith, Ph.D. 

cDNA Library Preparation: M. Bento Scares, Ph.D., M. Fatima 
Bonaldo, Ph.D. 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by:Incyte Genomics, Inc. 

Clone distribution: NCI-CGAP clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http : //image . llnl . gov 

Plate: LLAM11353 row: 1 column: 12 
High quality sequence start: 17 
High quality sequence stop: 856. 
FEATURES Location/Qualifiers 
source 1. .856 

/organism="Mus musculus" 

/mol_t ype= "mRNA" 

/strain="Czech II" 

/db_xref="taxon: 10090" 

/clone=" IMAGE: 5143451" 

/tissue_type="pooled lung tumors" 

/lab_host="DH10B (phage-resistant ) " 

/clone_lib="NCI_CGAP_Lu33" 

/note="Organ: lung; Vector: pT7T3D-Pac (Pharmacia) with a 
modified polylinker; Site_l : NotI; Site 2: EcoRI; 1st 



ORIGIN 



strand cDNA was prepared from mRNA obtained from pooled 
lung tumors with a Not I - oligo(dT) primer [5' 
TGTTACCAATCTGAAGTGGGAGCGGCCGCCTCTGTTTTTTTTTTTTTTTTT 3 ' ] . 
Double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Not I and cloned into the Not 
I and Eco RI sites of the modified pT7T3 vector. Library 
went through one round of normalization, and was 
constructed by Bento Soares and M. Fatima Bonaldo. " 



Query Match 2.5%; Score 39; DB 12; Length 856; 

Best Local Similarity 54.5%; Pred. No. 11; 

Matches 78; Conservative 0; Mismatches 65; Indels 0; Gaps 0; 

QY 408 TGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCT 4 67 

I I I I I I I I I I I I II Ml III I I I I I I I I I 

Db 5 61 T C CATT GT CCT AGCCAGGAAGTGC CT AGCCT GAGACAGACACAGT GGAGT CT GAGT CAC A 502 

QY 4 68 CACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCT 527 

I I I I I II I I M I I I I I I I I I I I I I I I I I I I I I I I I 

Db 501 CAGT C CAT CT C AGC CT C T CT GAGCTT C C T GAGAC AT GGAT C GAGACAG GGT AC G G C G C AG 4 42 

Qy 52 8 CGGCACAGCTTAGGTGTCCTGCA 55 0 

I I I I I I I I I I I III 
Db 441 GGGCCCGGGTTTGCTGACTGGCA 419 



RESULT 37 

CB590318/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



linear EST 03-APR-2003 



Craniata; Vertebra ta; Euteleos tomi ; 
Sciurognathi; Muridae; Murinae; Mus, 



FEATURES 

source 



CB590318 929 bp mRNA 

AGENCOURT_12 60568 9 NIH_MGC__136 Mus musculus cDNA clone 
IMAGE: 30287594 5', mRNA sequence. 
CB590318 

CBS 903 18. 1 GI : 2 9508174 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia ; Eutheria ; Rodentia ; 
1 (bases 1 to 929) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 
Email : cgapbs-r@mail . nih . gov 
Tissue Procurement: Dr. David Rowe 
cDNA Library Preparation: Invitrogen Corp 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Agencourt Bioscience Corporation 

Clone distribution: MGC clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

http://image.llnl . gov 

Plate: NDAM323 row: 1 column: 03 

High quality sequence start: 17 

High quality sequence stop: 520. 
Location/Qualif iers 
1. .929 



ORIGIN 



/organism="Mus musculus" 
/mol_type= n mRNA" 
/db_xref="taxon: 10090" 
/clone=" IMAGE: 30287594" 

/tissue_type="embryonic limb, maxilla and mandible" 
/lab_host="DH10B (phage-resistant ) " 
/ cl one_l ib= "N I H__MGC_1 36" 

/note="Vector : pCMV-SPORT6 . 1 ; Site_l: EcoRV; Site_2: NotI; 
Normalized, full-length enriched library from pool of 
mouse embronic limb, maxilla and mandible, embryonic day 
17.5, 18.5 and newborn (mandible (5, 4 and 1 limb and jaw 
equivalents from respective days) . Cloned directionally, 
oligo-dT primed ( 5 1 -GACTAGTTCTAGATCGCGAGCGGCCGCCC (T ) 15-3 1 . 
Size selected for the >lkb fragments, average insert size 
1.2 kb. Normalization to Cot 7.5 . Tissue contributed by 
David Rowe; library constructed by ResGen, Invitrogen 
Corp. Note: this is a NIH_MGC Library." 



Query Match 2.5%; Score 39; DB 14; Length 929; 

Best Local Similarity 54.5%; Pred. No. 11; 

Matches 78; Conservative 0; Mismatches 65; Indels 0; Gaps 0; 

Qy 408 TGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCT 467 

I I I I M I I I I I I II II I III I I I I I II I I 

Db 597 T C CAT T GT C CT AGC C AGGAAGT GC CTAAC CT GAGACAGAC AC AGT GG AGT CT GAGT C ACA 538 

Qy 468 CACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCT 527 

M Ml II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 537 CAGT C CAT C T CAGC CT C T CT GAGCT T C CT GAGAC AT GGAT C GAGAC AG G GT AC G G C G C AG 478 

Qy 528 CGGCACAGCTTAGGTGTCCTGCA 550 

I I I I I I I I I I I III 
Db 477 GGGCCCGGGTTTGCTGACTGGCA 4 55 



RESULT 38 

AK010167/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 



AK010167 933 bp mRNA linear HTC 20-SEP-2003 

Mus musculus adult male tongue cDNA, RIKEN full-length enriched 
library, clone : 2310075E03 product : titin-cap, full insert sequence. 
AK010167 

AK010167. 1 GI: 128 45417 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 

Carninci,P. and Hayashizaki , Y . 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 



COMMENT 



Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata, K. , Itoh,M. , 
Konno, H. , Akiyama, J. 



Aizawa, K. , 
r Nishi, K. , 



Sumi,N., Ishii,Y., Nakamura,S. 



Carninci, P . , 
, Itoh,M., 
Harada, A. , 



Nagaoka,S., Sasaki, N., 
Kitsunai,T. , Tashiro,H 
Hazama,M. , Nishine,T. , 
Yamamoto,R., Matsumoto, H . , Sakaguchi , S . , Ikegami,T., Kashiwagi , K. , 
Fujiwake, S. , Inoue,K., Togawa,Y., Izawa,M., Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J. 
Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

The RIKEN Genome Exploration Research Group Phase II Team and the 
FANTOM Consortium. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409, 685-690 (2001) 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 933) 
Adachi,J., Aizawa,K., Akahira,S., 
Arakawa,T., Bono,H., Carninci, P., 
Furuno,M. 
Hiraoka,T 



Akimura,T., Arai,A., Aono,H. 
Fukuda , S . , Fukuni shi , Y . , 
Hanagaki,T., Hara,A., Hayatsu,N., Hiramoto,K., 
Hori,F., Imotani,K,, Ishii,Y., Itoh,M., Izawa,M., 



Kasukawa,T., Kato,H., Kawai,J., Kojima,Y., Konno, H., Kouda,M., 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A. , Nishi, K. , 
Nomura, K., Numazaki,R., Ohno,M., Okazaki,Y., Okido,T., Owa,C, 
Saito,H., Saito,R., Sakai,C, Sakai,K., Sano,H., Sasaki, D . , 
Shibata, K., Shibata, Y., Shinagawa,A. , Shiraki,T., Sogabe,Y., 
Suzuki, H., Tagami,M., Tagawa,A. , Takahashi, F . , Tanaka,T., 
Tejima,Y., Toya,T., Yamamura,T., Yasunishi, A. , Yoshida,K., 
Yoshino,M., Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Submitted ( 10- JUL-2000 ) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN), Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res@gsc . riken . go . jp, 
URL :http: //genome. gsc. riken. go. jp/, Tel : 81-45-503-9222 , 
Fax: 81-45-503-9216) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details. 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. First strand cDNA was primed with a primer 



FEATURES 

source 



CDS 



[5 1 GAGAGAGAGAAGGATCCAAGAGCTCTTTTTTTTTTTTTTTTVN 3'], cDNA was 
prepared by using trehalose t he rmo- activated reverse transcriptase 
and subsequently enriched for full-length by cap-trapper. Second 
strand cDNA was prepared with the primer adapter of sequence [5 r 
GAGAGAGAGAT T CT C GAGT T AAT T AAAT T AAT CCCCCCCCCCCCC 3 ' ] . cDNA was cleaved 
with Xhol and Sstl. Cloning sites, 5 1 end: Xhol; 3 ? end: Sstl. 
Host: SOLR. 

Location/Qualifiers 
1. .933 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="FANTOM_DB:2310075E03" 

/db_xref="MGI: 1910392" 

/db_xref="taxon: 10090" 

/clone="2310075E03" 

/sex="male" 

/ tissue_type="tongue" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 

/dev_stage="adult" 

25. .528 

/note="unnamed protein product; putative 
titin-cap (MGD | MGI : 1330233 ) " 
/codon_start=l 
/protein_id="BAB26743. 1" 
/db_xref="GI : 12845418" 

/ trans lation="MATSELSCQVSEENQERREAFWAEWKDLTLSTRPEEGCSLHEED 
TQRHETYHRQGQCQAWQRSPWLVMRLGILGRGLQEYQLPYQRVLPLPIFTPTKVGAS 
KEEREETPIQLRELLALETALGGQCVERQDVAEITKQLPPWPVSKPGPLRRTLSRSM 
SQEAQRG" 



ORIGIN 



Query Match 2.5%; Score 39; DB 11; Length 933; 

Best Local Similarity 54.5%; Pred. No. 11; 

Matches 78; Conservative 0; Mismatches 65; Indels 0; 



Gaps 



0; 



Qy 

Db 



4 08 TGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCT 4 67 

I I I I I I I II III Ml I I I I I I I I I 

597 T C CAT T GT C CT AGC C AGGAAGT GC C TAG C CT GAGACAGACAC AGT GGAGT CT GAGT C AC A 538 



Qy 



Db 



4 68 CACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCT 527 

M Ml II M I II I I I I I I I I I I I II I I I I I I I I I I 

537 CAGT C CAT CT C AGC CT CT C TGAGC T T C C T GAGAC AT G GAT C GAGACAGG GT AC GGC GC AG 478 



Qy 



Db 



528 CGGCACAGCTTAGGTGTCCTGCA 550 

I I I I I I I I I I I III 
477 GGGCCCGGGTTTGCTGACTGGCA 455 



RESULT 39 

BC032992/c 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 



BC032992 946 bp 

Mus musculus, clone IMAGE : 1281423 , 
BC032992 

BC032992.1 GI:21426937 
HTC. 

Mus musculus (house mouse) 



mRNA 
mRNA. 



linear HTC 20-SEP-2002 



ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 



Mus mus cuius 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus, 
1 (bases 1 to 946) 
Strausberg, R. 
Direct Submission 

Submitted ( 14- JUN-2002 ) National Institutes of Health, Mammalian 
Gene Collection (MGC) , Cancer Genomics Office, National Cancer 
Institute, 31 Center Drive, Room 11A03, Bethesda, MD 20892-2590, 
USA 

NIH-MGC Project URL: http://mgc.nci.nih.gov 
Contact: MGC help desk 
Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: Marcello Bento Scares, Ph.D. 

cDNA Library Preparation: Soares Laboratory 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Institute for Systems Biology 

http : //www. systemsbiology . org 

contact : amadan@systemsbiology . org 

Anup Madan, Jessica Fahey, Erin Helton, Mark Ketteman, Anuradha 
Madan, Stephanie Rodrigues, Amy Sanchez and Michelle Whiting 



FEATURES 

source 



Clone distribution: MGC clone distribution information can be found 
through the I.M.A.G.E. Consortium/LLNL at: http://image.llnl.gov 
Series: IRAK Plate: 66 Row: k Column: 11 

This clone was selected for full length sequencing because it 
passed the following selection criteria: Hexamer frequency ORF 
analysis 

This clone has the following problem: no 5* EST match. 
Location/ Qualifiers 
1. .946 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL/6J" 
/db_xref= n taxon: 10090" 
/clone="IMAGE: 1281423" 
/tissue_type="Thymus gland, mouse 1 ' 
/clone_lib="Soares_thymus_2NbMT" 
/lab_host="DH10B" 
/note="Vector : pT7T3D-Pac" 



ORIGIN 



Query Match 2.5%; Score 39; DB 11; Length 946; 

Best Local Similarity 54.5%; Pred. No. 11; 

Matches 78; Conservative 0; Mismatches 65; Indels 0; 



Gaps 



0; 



Qy 



Db 



4 08 TGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCT 467 

III I I I I I I I I I II III II I I I I I I I I I I 

581 T C CAT T GT C CT AGC CAG GAAGT G C CT AG C CT GAGACAGAC AC AGT GGAGT CT GAGT C ACA 522 



Qy 

Db 

Qy 
Db 



4 68 CACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCT 527 

II III II I I I II I I I I I I I I I I I I I I I I I I I I I I I 

521 CAGT C CAT CT CAG C CT CT CT GAGC T T C CT GAGACAT G GAT C GAGAC AG GGT AC GGC G CAG 4 62 

528 CGGCACAGCTTAGGTGTCCTGCA 550 

I I I I I I I I I I I III 
461 GGGCCCGGGTTTGCTGACTGGCA 439 



RESULT 40 

CNS0037Q/c 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



CNS0037Q 1101 bp DNA linear GSS 03-JUN-199 

Drosophila melanogaster genome survey sequence TET3 end of BAC # 
BACR08K14 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL064465 

AL064465. 1 GI: 4941722 
GSS. 

Drosophila melanogaster (fruit fly) 
Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 
Neoptera; Endopterygota ; Diptera; Brachycera; Muscomorpha; 
Ephydroidea; Drosophilidae; Drosophila. 
1 (bases 1 to 1101) 
Genoscope . 
Direct Submission 

Submitted ( 02- JUN-1999 ) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP 1 s 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/Qualifiers 

1. .1101 

/organism- "Drosophila melanogaster" 
/mol_type= "genomic DNA" 
/db_xref="taxon: 7227" 
/clone-"BACR08K14" 
/clone_lib="RPCI-98" 
/note="end : TET3 " 



ORIGIN 



Query Match 2.5%; Score 39; DB 29; Length 1101; 

Best Local Similarity 12.9%; Pred. No. 13; 

Matches 59; Conservative 188; Mismatches 210; Indels 0; Gaps 0 

Qy 4 09 GCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTC 468 

:::::: :::::::::: | : : :::::: : : | : : : : : 

Db 1034 KMKNBMKNNBVKVKMKCKBABNKCKMKMNCKMBMKNVBGBKCBCNMMCKASCMGSBMSCS 975 

Qy 4 69 ACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTC 52 8 



Db 



974 CS RCKCKNKKBKBKTCKBKKKBKKKCKBTBTMBMBKNBKYKBKKYNKCNKMCBYDCBBCY 915 



Qy 


529 


GGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCA 


588 


Db 


914 


CKCKHKYKCKCKVWKBDAADAKNKNANAAAAAAMDHMDVMBAMBS 


855 


Qy 


589 


AAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGT 


648 


Db 


854 


CNCKNBVBKNBANDCTCNTKWYTWDYKYYKTHTMKBYKTCYMTMBYYTTCTWYATMKTYY 


795 


Qy 


649 


T GT CT GT CCAGCAGAT CAGGGTGAAAGT GGACAGT CT GTAACAACAGT GAGT CGTT CCT C 
III: : 1 : : 1 1 : : : : : : : : | : | : : : | : | | : || : | 
TMT C BT CT YAKT WTATMT CHKMKHMMMMDMWC KMKCKMHMATMACMMMNTMMT YTTMT TT 


708 


Db 


7 94 


735 


Qy 


709 


CTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTT 


768 


Db 


734 


: 1 : 1 : 1 1 : : : ::::::: : : | : : : : : : 

YTYKTAYTKTTCTYTKTBTKYAMAKAHAATTMBNHWBYC^^ 


675 


Qy 


769 


GCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGT 


828 


Db 


674 


|:: |: : | |:|: |::: :::| : ::: ||: || ::: s 
AMKTWCNTGTAYKYAKNHTTCNTBTSTWKMNCMYBHHMYCHMNTRYMTCCHCTCAYKYAH 


615 


Qy 


829 


AGAT G GAGAAG GCT C G GAGAGT GG GG GTGCTGGGGGC 8 65 




Db 


614 


: : : : II : I : : : : : I : I : : 1 MM 
RTSHRYDYTAGMADCTVDDRNRTRGVGDRRVGAGGGC 578 





RESULT 41 

BQ752298 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



FEATURES 



BQ752298 834 bp mRNA linear EST 18-JUL-2002 

EST632861 DSCT Colletotrichum trifolii cDNA clone pDSCTll-59, mRNA 
sequence . 
BQ752298 

BQ752298. 1 GI: 21907703 
EST. 

Colletotrichum trifolii 
Colletotrichum trifolii 

Eukaryota; Fungi; Ascomycota; Pezizomycotina ; Sordariomycetes; 
Sordariomycetes incertae sedis; Phyllachorales ; Phyllachoraceae; 
mitosporic Phyllachoraceae; Colletotrichum. 
1 (bases 1 to 834) 

Samac,D.A., Dickman,M. , Town, CD., VanAken,S., Utterback, T . , 
Cheung, F. and Fraser,C.M. 

ESTs from mycelia of Colletotrichum trifolii race 1 

Unpublished (2002) 

Other_ESTs : EST632862 

Contact: Deborah A. Samac 

Department of Plant Pathology 

University of Minnesota 

495 Borlaug Hall, 1991 Upper Buford Circle, St. Paul, MN 55108, USA 
Tel: 612 625 1243 
Fax: 651 649 5058 

Email: debbys@puccini.crl.umn.edu 

TIGR sequence name: MTSAK59TK More information is available at: 
www.medicago . org 

Seq primer: SKmod (CTA gAA CTA gtg gAT CC) . 
Location/Qualifiers 



source 1. .834 

/ organism="Colletotrichum trif olii" 

/mol_type="mRNA" 

/strain="race 1" 

/db_xref="taxon:5466" 

/clone="pDSCTll-59" 

/tissue_type= f, mycelia" 

/dev_stage= M Young, actively growing mycelia (3 days after 

inoculation) grown in liquid culture (cutin minimal medium 

containing 2%glucose) . " 

/lab_host="DH5alpha" 

/clone_lib= f, DSCT" 

/note="Vector: pBluescript SK+; Site_l: EcoRI; Site_2 : 
EcoRI; isolate: 2sp2 ; cDNA was prepared from polyA+ 
enriched RNA The cDNA was ligated into Lambda gtll from 
Stratagene and packaged using Gigapack packaging extracts . 
An aliquot of the amplified library was used to transduce 
E. coli Y1090 and phage DNA was purified from a liquid 
lysate. The cDNA inserts were gel purified after EcoRI 
digestion and ligated into pBluescript SK+. Aliquots of 
the ligation were used to transform E. coli DHSalpha which 
were plated onto medium with X-gal for selection of 
recombinants . " 

ORIGIN 

Query Match 2.5%; Score 38.8; DB 13; Length 834; 

Best Local Similarity 53.2%; Pred. No. 12; 

Matches 82; Conservative 0; Mismatches 72; Indels 0; Gaps 0; 

Qy 431 GCTGCCCTTTCT GAGT C CAGAGGGAGC C AGAGGGC CT C ACAT CAACAGAGG GT C T C T GAG 4 90 

I I I I M I I I I I I I I I II I I I I I I I I I I I II I I 

Db 424 GCTGGGAGGTCGCGGCCCGGGTGGCGCGCCGAGGCTTTCCATCGACGGCGGCTTGTTGAG 483 

Qy 491 CTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCA 550 

Mill I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 484 CTCCGTTCCGGAAGGTCCGCCCGGAGGCGCGCTCGCTCGTCCCATAGGCGGCGTCATTCA 543 

Qy 551 TGTGTCCTACAGCGTCAGGTAAGGGGACCTCCAC 584 

I I II I I I I I I I I I I I I 

Db 544 TGCGTGGGGCGGCGGCGGCGTTCCCGTCGTCGAC 577 



RESULT 42 

AA390068/C 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AA390068 551 bp mRNA linear EST 23-APR-1997 

mv35b05,rl GuayWoodford Beier mouse kidney day 0 Mus musculus cDNA 
clone IMAGE:657009 5' similar to gb:J05021 EZRIN (HUMAN); gb:X60671 
M. musculus mRNA for ezrin (MOUSE);, mRNA sequence. 
AA390068 

AA390068.1 GI:2043083 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 551) 

Marra,M., Hillier,L., Allen, M., Bowles,M., Dietrich, N., Dubuque,T., 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



Geisel,S., Kucaba,T., Lacy,M., Le,M., Martin, J., Morris, M., 
Schellenberg,K. , Steptoe,M. , Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email : mousees t@watson . wustl . edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI:402857 

High quality sequence stop: 279. 
Location/Qualifiers 
1. .551 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain= n C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="IMAGE: 657009" 

/tissue__type="kidney" 

/dev_stage="newborn (day 0)" 

/lab_host="SOLR (kanamycin resistant)" 

/clone lib="GuayWoodford Beier mouse kidney day 0" 

/note="Organ: kidney; Vector: pBluescript SK-; Site_l: 

EcoRI; Site_2: Xhol; Cloned unidirectionally . Primer: 

Oligo dT. Average insert size: 1.0 kb; Uni-ZAP XR Vector; 

~5' adaptor sequence: 5 1 GAATTCGGCACGAG 3' -3' adaptor 

sequence: 5* CTCGAGTTTTTTTTTTTTTTTTTT 3 1 Library provided 

Lisa Guay-Woodf ord. " 



ORIGIN 



Query Match 2.5%; Score 38.6; DB 9; Length 551; 

Best Local Similarity 53.0%; Pred. No. 11; 

Matches 80; Conservative 0; Mismatches 71; Indels 0; Gaps 0; 



Qy 

Db 

Qy 

Db 



27 TCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGCCCTTT 8 6 

|| | | | | I I I I I I I I I I I I I I I I I 1 I I I I I M MM I 

197 TCACCAGGTGCAGCTCCTCTTTGGTCTTCACCAGGTCGTCCTGGGCTTCTTTAGCCCGGT 138 

87 GTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTG 14 6 

| I I I II I I M I I II M I II I I III 

137 GCTGCCACTCTTCTACCTCGTCCTCCNNTCGCTCCGCGCCTCCTCCAGCAGTGCGATCTT 7 8 



Qy 

Db 



147 AGCCCTCCTCTGTGCCAGCCTTTCTCCCAGC 177 

I I I I II II I I I II M Mill 
77 GGCCGTGTACTCTGCCAGCTCTGCAGCCAGC 47 



RESULT 43 
CNS00418 

L0CUS CNS00418 987 bp DNA linear GSS 03-JUN-1999 

DEFINITION Drosophila melanogaster genome survey sequence TET3 end of BAC # 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BACR09C16 of RPCI-98 library from Drosophila melanogas ter (fruit 

fly), genomic survey sequence. 

AL066537 

AL066537. 1 GI:4942778 
GSS. 

Drosophila melanogaster (fruit fly) 
Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 
Neoptera; Endopterygota ; Diptera; Brachycera; Muscomorpha; 
Ephydroidea; Drosophilidae; Drosophila. 
1 (bases 1 to 987) 
Genoscope . 
Direct Submission 

Submitted ( 02- JUN-1999 ) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of c 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2 ; cn bw sp, the same strain used for the BDGP 1 s 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/Qualif iers 

1. .987 

/organism^" Drosophila melanogaster" 

/mol_type=" genomic DNA" 

/db_xref="taxon:7227" 

/ clone= " BACRO 9 C 1 6 " 

/clone_lib="RPCI-98" 

/note="end : TET3" 



ORIGIN 



Query Match 2.5%; Score 38.6; DB 29; Length 987; 

Best Local Similarity 27.5%; Pred. No. 16; 

Conservative 58; Mismatches 82; Indels 0; Gaps 0; 



Matches 



53; 



Qy 

Db 

Qy 

Db 

Qy 

Db 



3 9 TCTCTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCC 98 

| |:|| Ml :M : I I : : : I I I I : : I : I I hll" 

7 3 0 TTTYTTTTTTTYYTTTCCYTCTCTCCYTCCYYCYYYTTTYTYYTYTYTTTTCCYCYTCYY 7 8 9 

9 9 CTGCTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTG 158 

|: |: :| :|:|||:: : ||: |:|::: h : I" 

7 9 0 CYYCYCYTYYYYTYTCTYYYYTTT YYCYCYYYCYYYCYYCTYYYCYYYYYYCYYCYCTCY 849 

15 9 TGCCAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGT 218 

|: :::::||::| I :: ::||: : ::| :: I : : I I I 

850 CYCYYYYYYYYCT YYCYYCYCYYT YCTCYC YYTCTT YYYTT YYYTT YYTTYTT YYTTT YT 909 



Qy 219 TCTGCCTATTGTC 231 

II : I : : : : 
Db 910 TTTYTYTTYYYYY 922 



RESULT 44 

BX335650/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



BX335650 1201 bp mRNA linear EST 02-MAY-2003 

BX335650 Homo sapiens PLACENTA COT 25-NORMALIZED Homo sapiens cDNA 
clone CS0DI017YH11 5-PRIME, mRNA sequence. 
BX335650 

BX335 650.1 GI : 30343426 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota ; Metazoa ; Chordata ; 
Mammalia ; Eutheria ; Primates ; 
1 (bases 1 to 1201) 
Li,W.B. , Gruber,C. , 



FEATURES 

source 



ORIGIN 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 

and Polayes,D. 



Jessee, J. 

Full-length cDNA libraries and normalization 
Unpublished (2001) 
Contact : Genoscope 

Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - France 

Email: seqref@genoscope.cns.fr, Web : www.genoscope.cns.fr 
Library was constructed by Life Technologies, a division of 
Invitrogen. Contact : Feng Liang Email : fliang@lifetech.com URL : 
http://fulllength.invitrogen.com/ InVitroGen Corporation 1600 
Faraday Avenue Genoscope sequence ID : CS0DI017CD06QP1 . 

Location/Qualifiers 

1. .1201 

/organism="Homo sapiens" 
/molJ:ype="mRNA" 
/db_xref="taxon: 9606" 
/clone="CS0DI017YHll" 

/ tissue_type=" PLACENTA COT 25-NORMALIZED" 
/clone_lib= n Homo sapiens PLACENTA COT 25-NORMALIZED" 
/note="lst strand cDNA was primed with a Notl-oligo {dT ) 
primer. Five prime end enriched, double-strand cDNA was 
digested with Not I and cloned into the Not I and EcoR V 
sites of the pCMVSPORT 6 vector. Library was normalized." 



Query Match 2.5%; Score 38.6; DB 13; Length 1201; 

Best Local Similarity 43.6%; Pred. No. 18; 

Matches 78; Conservative 18; Mismatches 83; Indels 0; Gaps 0; 



Qy 

Db 

Qy 

Db 

Qy 



22 CCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGC 8 1 

I : I : : I 1 II I I 1 I I I I I M : I : : I 1 I I III I 

835 CYCTKKCCCTTCCTTTTTTTTTTTTTTTTTTTYCCYCCYTTTCCCCCCCCCCTTTTTTTT 77 6 

82 CCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGC 141 

: I I I I I Mill M Mill I::: I MM 

775 TSTTTTTTTYCTTTCCCTTCTTTTTTTCTTTTTTCCTTTTTTGGCCYKKGGCTKKKGKKC 716 

142 ACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT YTCTGGCAAACACTTCC 2 00 
: | I II II II I M M I I I I M MM I I II I I I 



Db 



715 YKBTTCCCCCCCCCCTTTKGCCCCCCCSCCCCCTTKTTTTYTTTTTTCCTTCCTCCTCC 657 



RESULT 45 

AA525033 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



Craniata; Vertebrata; Euteleostomi; 
Catarrhini; Hominidae; Homo. 



AA525033 407 bp mRNA linear EST 05-AUG-1997 

nh36c06.sl NCI_CGAP_Pr3 Homo sapiens cDNA clone IMAGE: 954442 , mRNA 
sequence . 
AA525033 

AA525033.1 GI:2265961 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 407) 

NCI-CGAP http : //www. ncbi . nlm. nih . gov/ ncicgap . 

National Cancer Institute, Cancer Genome Anatomy Project (CGAP) , 
Tumor Gene Index 
Unpublished (1997) 

Contact: Robert Strausberg, Ph.D. 
Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: W. Marston Linehan, M.D., Rodrigo Chuaqui, 
M.D., Michael Emmert-Buck, M.D., Ph.D. 
cDNA Library Preparation: David B. Krizman, Ph.D. 

cDNA Library Arrayed by: Genome Systems Inc., Greg Lennon, Ph.D. 
DNA Sequencing by: Washington University Genome Sequencing Center 
Clone distribution: NCI-CGAP clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
www-bio . llnl . gov/bbrp/ image/ image . html 
Insert Length: 507 Std Error: 0.00 
Seq primer: -40ml3 fwd. ET from Amersham. 
Location/Qualifiers 
1. .407 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone=" IMAGE: 954442" 
/sex="Male" 

/dev_stage="45 years old" 
/lab_host="DH10B M 
/clone_lib="NCI_CGAP_Pr3" 

/note="Vector : pAMPIO; Site_l: Notl; Site_2: EcoRI; 1st 
strand cDNA was primed with oligo(dT)17 on 50 ng of 
DNAse-treated, total cellular RNA obtained from 
5,000-10,000 microdissected cells 

histologically-determined to be fully malignant prostate 
cancer cells. Double-stranded cDNA was ligated to EcoRI 
adaptors, 5 cycles of PCR applied to the cDNA with an 
adaptor-specific primer, and the resulting PCR product 
subcloned into pAMPIO by the UDG-cloning method (Life 
Technologies) . Average insert size is 600 bp. NOTE: Not 
directionally cloned. This library was constructed by 
David Krizman. " 



ORIGIN 



Query Match 



2.4%; Score 38.4; DB 9; Length 407; 



Best Local Similarity 49.5%; Pred. No. 10; 

Matches 99; Conservative 0; Mismatches 101; Indels 0; Gaps 



Qy 741 TTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTC 8 00 

I I II I I I I M M I I I I I I I I I I I II 

Db 10 8 TTAATATACTCTATGGATGACCCAGCAAGTTTGCTGTTTCAGAATCCTCCTCTTCTGTTT 167 

Qy 8 01 CTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTG 860 

| I I I II II I I I I I II M I 

D b 168 TTTGAACTTTCGAAAACAAAAGATGTGCTGGGAGACGCGGCCCCTAGAGTGTGCTTACTC 227 

Qy 861 GGGGCACAAAAT GGAAT GAAC ACT GC T GAAGGAAT G C AG G GT T CAC T T CAAGAAGAAAGC 920 

II I III II II I I I I I I M I I I I I I I M 

Db 228 CAGGTCCTTGATTGTCCAGACTGTGGAGGGGGAAGGGCAGATCTATGCCAAGAGGGGAAC 287 

Qy 921 AGT GT GCAGGT GT AC CAT CT 94 0 

I I I I I I I I I I I 
Db 28 8 AGGCTGTAGAGGCCACAGCT 307 



RESULT 46 

AA524916 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



AA524916 



FEATURES 

source 



412 bp mRNA linear EST 05-AUG-1997 
nh31a09.sl NCI_CGAP_Pr3 Homo sapiens cDNA clone IMAGE : 953944 , mRNA 
sequence . 
AA524916 

AA52 4 916. 1 GI: 226584 4 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 412) 

NCI-CGAP http : //www. ncbi . nlm. nih . gov/ ncicgap . 

National Cancer Institute, Cancer Genome Anatomy Project (CGAP) , 
Tumor Gene Index 
Unpublished (1997) 

Contact: Robert Strausberg, Ph.D. 
Email : cgapbs-r@mail . nih. gov 

Tissue Procurement: W. Marston Linehan, M.D., Rodrigo Chuaqui, 
M. D. , Michael Emmert-Buck, M.D., Ph.D. 
cDNA Library Preparation: David B. Krizman, Ph.D. 

cDNA Library Arrayed by: Genome Systems Inc., Greg Lennon, Ph.D. 
DNA Sequencing by: Washington University Genome Sequencing Center 
Clone distribution: NCI-CGAP clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

www-bio . llnl . gov/bbrp/image/ image . html 

Insert Length: 501 Std Error: 0.00 

Seq primer: -4 0ml 3 fwd. ET from Amersham 

High quality sequence stop: 340. 
Location/Qualifiers 
1. .412 

/organism="Homo sapiens" 
/mol__type="mRNA" 
/db_xref="taxon: 9606" 
/clone="IMAGE: 953944" 
/sex= M Male" 



/dev_stage="45 years old" 

/lab_host="DH10B" 

/clone_lib= ,, NCI_CGAP_Pr3" 

/note="Vector: pAMPIO; Site_l: Notl; Site_2: EcoRI; 1st 
strand cDNA was primed with oligo(dT)17 on 50 ng of 
DNAse-treated, total cellular RNA obtained from 
5,000-10,000 microdissected cells 

histologically-determined to be fully malignant prostate 
cancer cells. Double-stranded cDNA was ligated to EcoRI 
adaptors, 5 cycles of PCR applied to the cDNA with an 
adaptor-specific primer, and the resulting PCR product 
subcloned into pAMPIO by the UDG-cloning method (Life 
Technologies). Average insert size is 600 bp. NOTE: Not 
directionally cloned. This library was constructed by 
David Krizman. " 



ORIGIN 



Query Match 2.4%; Score 38.4; DB 9; Length 412; 

Best Local Similarity 49.5%; Pred. No. 10; 

Matches 99; Conservative 0; Mismatches 101; Indels 0; Gaps 0; 

Qy 741 TTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTC 800 

I I I I I I I II III III II II I I I I I I I I I I I II 

Db 108 T T AAT AT ACT C TAT G GAT GAC C C AGCAAGT TTGCTGTTT CAGAAT C CT C C T CTT CT GT T T 167 

Qy 801 CTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTG 860 

I I I I II II I I I I I II I I I I I I I I I I I III 

Db 168 TTTGAACTTTCGAAAACAAAAGATGTGCTGGGAGACGCGGCCCCTAGAGTGTGCTTACTC 227 

Qy 861 G G GGCACAAAAT GGAAT GAACAC T GCT GAAGGAAT GCAG GGT T C ACT T CAAGAAGAAAGC 920 

II I III II II I I I M I M I I I M I I II 

Db 228 CAGGT CCTT GATT GT C CAGACT GT GGAGGGGGAAGGGCAGAT CTAT GCCAAGAGGGGAAC 287 

Qy 921 AGTGTG CAGGT GT AC CAT CT 940 

II I I II I I I I I 

Db 288 AGGCTGTAGAGGCCACAGCT 307 



RESULT 47 

CF486702 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



CF486702 472 bp mRNA linear EST 08-SEP-2003 

POLl_39_F01.bl_A002 Pollen Sorghum bicolor cDNA clone 
POL1_39_F01_A002 3', mRNA sequence. 
CF486702 

CF4 8 6702.1 GI: 34 515571 
EST. 

Sorghum bicolor (sorghum) 
Sorghum bicolor 

Eukaryota; Viridiplantae ; Streptophyta ; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACCAD 
clade; Panicoideae; Andropogoneae; Sorghum. 
1 (bases 1 to 472) 

Cordonnier-Pratt,M.-M. , Suzuki, Y. , Sugano,S. 
Sun,F., Sullivan, R., Eastman, A. , Cannon, R. 
Lucas, A., Al-Sheikh, A. , Jones, V., Adibi,N. 
Pratt, L.H. 

EST database from Sorghum: pollen 



Klein, R.R. , Liang, C. , 
Kern, B. , Morgan, J. , 
Owen, A., Gao,J. and 



JOURNAL 
COMMENT 



FEATURES 

source 



Unpublished (2003) 

Other_ESTs : POL1_39_F01 . gl_A002 

Contact: Cordonnier-Pratt MM 

Laboratory for Genomics and Bioinf ormatics 

The University of Georgia, Department of Plant Biology 

Plant Sciences Building, Rm. 2502, Athens, GA 30602-7271, USA 

Tel: 706 542 1860 

Fax: 706 583 0210 

Email : mmpratt@uga . edu 

Library constructed by Dr. Yutaka Suzuki and Dr. Sumio Sugano in 
the Human Genome Center, University of Tokyo Institute of Medical 
Science; plant material and RNA prepared at Texas A & M University; 
sequencing done in the Laboratory for Genomics and Bioinf ormatics , 
University of Georgia. Sequence ends have been trimmed to exclude 
vector and regions below Phred quality 16. Three-prime sequences 
are presented as their reverse complement and have been trimmed to 
exclude polyA. 

Seq primer: Sug3 (CGACCTGCAGCTCGAGCACA) 
POLYA=Yes . 

Location/ Qualifiers 

1. .472 

/organism="Sorghum bicolor" 

/mol_t ype= "mRNA" 

/cultivar= M BTx623 n 

/db_xref="taxon: 4558" 

/ cl one= " POL 1_3 9_F0 1_A0 02" 

/lab_host="DH10B-Tl phage-resistant E. coli" 
/clone_lib="Pollen" 

/note="Organ: Pollen; Vector: pME18S-FL3; Site_l: Xhol; 
Site_2: Xhol; The library was prepared from polyA+ RNA 
from pollen at the late vacuolated-vacuolated stage of 
development. Pollen was harvested from greenhouse-grown 
panicles of sorghum line BTx623. Panicles were removed 
from the flag leaf prior to emergence, when no detectable 
amylase is present in pollen of male-fertile lines. This 
stage represents pollen collected from anthers about 8-14 
days prior to anthesis . Double-stranded cDNA was cloned 
unidirectionally into different Drain sites of the 
pME18S-FL3 vector (5-prime Drain site is CACTGTGTG, 
3-prime Drain site is CACCATGTG) . Xhol excises the cDNA 
insert . " 



ORIGIN 



Query Match 2.4%; Score 38.4; DB 14; Length 472; 

Best Local Similarity 50.5%; Pred. No. 11; 
Matches 93; Conservative 0; Mismatches 91; Indels 



0; Gaps 



0; 



Qy 8 90 AG GAAT G C AG GGT T CAC T T C AAGAAGAAAGC AGT GT G CAG GT GT AC CAT C T C C C AGT C AG 94 9 

M I I I I I I M I Mill I II I II II I I II II | || 

Db 188 AGT GAT GCAGT CCT CC CAT CAAGGACAAT GGAGAACT CAAGAGAACATT CGT CAC GT GGT 247 

Qy 950 AGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTC 1009 

M I I I I I I I I I I III I I I I I I I I I I I I 

Db 24 8 GGATTGGGTGATTGCATCAGTTTTTTTTAAGCGAGGATTCTGGGAGGAGGCAAATCGTGG 307 

Qy 1010 AT TAT AC C T C CAAG GACAACAGAGT GGT AC AT AAGG C T AAAAC AGAGT T GT CAAC C T GT C 1069 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 



Db 



308 GCTATGGCGTGCAGGAAAAAGGAGGGGTGCATGAAGCGTTTACGGAAGGGGTGAAGCGTT 367 



Qy 1070 CAGG 1073 

I i I 

Db 368 TAGG 371 



RESULT 48 

CNS006ON 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



CNS006ON 910 bp DNA linear GSS 03-JUN-1999 

Drosophila melanogaster genome survey sequence T7 end of BAC # 
BACR14J21 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL065629 

AL065629. 1 GI : 4 94 4 698 
GSS. 

Drosophila melanogaster (fruit fly) 
Drosophila melanogaster 

Eukaryota; Metazoa; Arthropocia; Hexapoda; Insecta; Pterygota; 

Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; 

Ephydroidea; Drosophilidae ; Drosophila . 

1 (bases 1 to 910) 

Genoscope . 

Direct Submission 

Submitted ( 02- JUN-1999) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of a 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP 1 s 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/Qualifiers 

1. .910 

/organism=" Drosophila melanogaster" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 7227" 
/ clone= " BACR1 4 J2 1 " 
/clone_lib="RPCI-98" 
/note="end : T7" 



ORIGIN 



Query Match 2.4%; Score 38.4; DB 29; Length 910; 

Best Local Similarity 22.7%; Pred. No. 17; 

Matches 34; Conservative 68; Mismatches 48; Indels 0; Gaps 0; 



Qy 



41 TCTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCT 100 



I : : I | | : | | : | : : : : : : : : : | | | : | : | : : | : : : : : : : : : | 

Db 745 TBYTKSTTSMTSTYTTBBSTSKBSTBTSTBKSTGTKTBTSBTTSCTSSSSSBSTSYSYST 804 

Qy 101 GCTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTG 160 

: I : : : : : : : I : : : : | | : I : I | : : : : : : : : : I : : : : I : : I I : I I : 
Db 805 SCBSSBSBSSTSYSBCTSTSTSTSSBBSSBSSSSSCGTSBTSSSTSTTSTCTSTTCKTST 864 

Qy 161 CCAGCCTTTCTCCCAGCATTCCTYTCTGGC 190 

: :: I |:|: I :|:|:|::: 

Db 865 GBSSYGTGTSTYTTBTTTATTSTSTSTSBB 894 



RESULT 4 9 

BZ850575/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



Tsegaye, G. , Geer, K. , 

Chen,D., Riggs,F., de Jong, P., 



BZ850575 472 bp DNA linear GSS 18-MAR-2003 

CH240_280J20.TV CHORI-240 Bos taurus genomic clone CH240_280 J20, 
genomic survey sequence. 
BZ850575 

BZ850575.1 GI:29077978 
GSS. 

Bos taurus ( cow) 
Bos taurus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Cetartiodactyla ; Ruminantia; Pecora; Bovoidea; 
Bovidae; Bovinae; Bos. 
1 (bases 1 to 472) 
Zhao,S., Shetty,J., Shatsman,S. 
Shvartsbeyn, A. , Gebregeorgis , E . 
Crawford, A.M. and McEwan,J.C. 

Bovine BAC End Sequences from Library CHORI-240 

Unpublished (2003) 

Contact : Shaying Zhao 

Department of Eukaryotic Genomics 

The Institute for Genomic Research 

9712 Medical Center Dr., Rockville, MD 20850, USA 
Tel: 301 838 0200 
Fax: 301 838 0208 
Email : szhao@tigr . org 

Clones are derived from the bovine BAC library CHORI-2 4 0 
(http://www.chori.org/bacpac/bovine240.htm). For BAC library 
availability, please contact Pieter de Jong (pdejong@mail.cho.org). 
Clones may be purchased from BACPAC Resources 

(http://www.chori.org/bacpac/ordering_information.htm). This work 
was undertaken as part of the International Bovine BAC Mapping 
Consortium (IBBMC) by AgResearch Ltd., New Zealand and The 
Institute of Genomic Research (TIGR) , USA. 
Plate: 280 row: J column: 20 
Seq primer: T7 
Class: BAC ends. 

Location/Qualifiers 

1. .472 

/organism="Bos taurus" 
/mol_type="genomic DNA n 
/strain="breed: Hereford" 
/db_xref="taxon: 9913" 
/ clone= " CH2 4 0_2 8 0 J2 0 " 
/sex="Male M 



/cell_type="Blood" 
/clone_lib="CHORI-240 M 

/note= M Vector : pTARBACl.3; Site_l: Mbol; Site_2: Mbol; 
Hereford bull LI Domino 99375; CHORI-240 Bovine BAC 
library (Male) produced by Pieter de Jong" 

ORIGIN 

Query Match 2.4%; Score 38.2; DB 28; Length 472; 

Best Local Similarity 56.9%; Pred. No. 13; 

Matches 70; Conservative 0; Mismatches 53; Indels 0; Gaps 0; 

Qy 608 TGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAG 667 

II I I I I I I III III I I I II I I I I I I I M II II 

Db 413 TGGTTTCTCTGGTTGGCGGGGGAGGGGGTGGAATTTCTGCAGTTCTGTACAGGGGAGGAG 354 

Qy 668 GGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGG 727 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 353 GGAGAAACTAGGACCTCTGAAACAGGAGTGTGGCTGTGCCCCCTGGAGTCCTGCAAAGGG 294 

Qy 728 CAG 730 

I I 

Db 293 GAG 291 



RESULT 50 

BZ849786/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BZ849786 560 bp DNA linear GSS 18-MAR-2003 

CH240_280D12.TV CHORI-240 Bos taurus genomic clone CH240_280D12, 
genomic survey sequence. 
BZ849786 

BZ849786.1 GI:29077187 
GSS. 

Bos taurus (cow) 
Bos taurus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Cetartiodactyla ; Ruminantia; Pecora; Bovoidea; 
Bovidae; Bovinae; Bos. 
1 (bases 1 to 560) 

Zhao,S., Shetty,J., Shatsman, S . , Tsegaye,G., Geer,K., 
Shvartsbeyn, A. , Gebregeorgis , E. , Chen,D., Riggs,F., de Jong, P., 
Crawford, A.M. and McEwan, J.C. 

Bovine BAC End Sequences from Library CHORI-240 

Unpublished (2003) 

Contact: Shaying Zhao 

Department of Eukaryotic Genomics 

The Institute for Genomic Research 

9712 Medical Center Dr., Rockville, MD 20850, USA 
Tel: 301 838 0200 
Fax: 301 838 0208 
Email: szhao@tigr.org 

Clones are derived from the bovine BAC library CHORI-240 
(http://www.chori.org/bacpac/bovine240.htm). For BAC library 
availability, please contact Pieter de Jong (pdejong@mail.cho.org). 
Clones may be purchased from BACPAC Resources 

(http://www.chori.org/bacpac/ordering_information.htm) . This work 
was undertaken as part of the International Bovine BAC Mapping 
Consortium (IBBMC) by AgResearch Ltd., New Zealand and The 



FEATURES 

source 



Institute of Genomic Research (TIGR 
Plate: 280 row: D column: 12 
Seq primer: T7 
Class: BAC ends. 

Location/Qualifiers 

1. .560 

/organism="Bos taurus " 
/mol_type=" genomic DNA" 
/strain="breed: Hereford" 
/ db_x r e f = " t axon : 9 9 1 3 " 
/clone= ,, CH240_28 0D12" 
/sex="Male" 
/cell_type="Blood M 
/clone_lib="CHORI-24 0" 
/note="Vector: pTARBACl 
Hereford bull Ll Domino 



USA. 



3; Site_l: Mbol; Site_2: Mbol; 
99375; CHORI-240 Bovine BAC 



ORIGIN 



library (Male) produced by Pieter de Jong" 



Query Match 2.4%; Score 38.2; DB 28; Length 560; 

Best Local Similarity 56.9%; Pred. No. 14; 

Matches 70; Conservative 0; Mismatches 53; Indels 0; Gaps 0; 

Qy 608 TGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAG 667 

M M I I I I III III II I II I I II I I I I I II II 

Db 436 TGGGTTCTCTGGTTGGCGGGGGAGGGGGTGGAATTTCTGCAGTTCTGTACAGGGGAGGAG 377 

Qy 668 GGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGG 727 

I I I I M I I I I I I I I I I I I I I I I I I I I MINI I I I I 

Db 376 GGAGAAACTAGGACCTCTGAAACAGGAGTGTGGCTGTGCCCCCTGGAGTCCTGCAAAGG^ 317 



Qy 



Db 



72 8 CAG 
I I 

316 GAG 



730 



314 



Search completed: April 29, 2004, 18:39:37 
Job time : 5121.38 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on : 



Title: 

Perfect score : 
Sequence : 



April 29, 2004, 14:53:09 ; Search time 6697.85 Seconds 

(without alignments) 
10159.758 Million cell updates/sec 

US-09-989-981A- 9_COPY_3436_5005 
1570 

1 cgaagcatcctgaagtacag ctagagagcaaacccagagc 1570 



Scoring table: I DENT I T Y__NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 3470272 seqs, 21671516995 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 50 summaries 



6940544 



Database 



GenEmbl : * 

1 : gb_ba : * 

2: gb_htg:* 

3 : gb_in : * 

4: gb_om:* 

5: gb_ov:* 

6: gb__pat:* 

7 : gb_ph : * 

8 : gb_pl : * 

9: gb_pr:* 
10: gb_ro:* 
11: gb_sts:* 
12: gb_sy:* 
1 3 : gb__un : * 
14: gb_vi:* 
15: em_ba:* 
16: em_f un : * 
17 : em_hum: * 
18: em__in:* 
1 9 : em__mu : * 
2 0 : em_om : * 
21: em_or:* 
22: em_ov:* 
23 : em_pat : * 
24: em_ph:* 
25: em_pl : * 
2 6 : em_ro : * 
27 : em sts : * 



7 ft 


• em 


Lin : ^ 






vi ; * 




> fcZllL 


Vi t" rr hi im • * 
ii i_ y ii L-iiu » 




. GUI 


i i l. y xii v • 




: 6in 


iiuy u Liici . 




: 6m 


Y\ t~ rr mi iq • ^ 

11 L- y ILL Ll O • 






ii i—y pxn • 




6IU 


ii uy x. \J\-A. • 


36 


em 


htg mam: * 


37 


em 


htg vrt:* 


38 


em 


sy : * 


39, 


em 


htgo hum:* 


40: 


em 


htgo mus : * 


41: 


em 


htgo other : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 





IN O . 


S core 


Match 


Length 


JJo 


x u 


Des crip t ion 




1 


X J D O 


99. 


9 


&0A ^ 

D U ft O 


0 


AADo O / o / 


AXbDD/j/ Sequence 




9 


Q^^ A 

J U D • *i 


61. 


5 


1UUU 


x U 


r jj! /oooUx 


ArJox/cfb Mus muscu 


c 




QUO i O 


55. 


3 


Oil A A ^ 


9 


AL. x U / Ul 


alizu /ul Kattus no 






O U O • U 


55. 


3 


O 1 OOC.O 
01^.0 JO 


o 
z. 


API i 97/17 


ALiiz /4 / Kattus no 






ODO i ^ 


55. 


3 




x U 


.M.I X 4 O O :? y 


Aii4oyyy Kattus no 


c 


D 


JJ J • D 


35. 


4 


^ (^7 


i n 


Ar fi U4t 1 U o 


At 4U41U8 Mus muscu 


c 


7 


412.4 


26. 


3 


588 


10 


AF404109 


AF404109 Rattus no 




8 


402.6 


25. 


6 


463 


10 


F351786S02 


AF351787 Mus muscu 


c 


9 


398.4 


25. 


4 


1314 


10 


F351799S01 


AF351799 Mus muscu 




10 


358.6 


22. 


8 


359 


6 


AX685738 


AX685738 Sequence 


c 


11 


299.4 


19. 


1 


185045 


2 


AC146466 


AC146466 Callithri 


c 


12 


298.8 


19. 


0 


178016 


2 


AC146787 


AC146787 Aotus nan 




13 


284.4 


18. 


1 


2351 


10 


AY195873 


AY195873 Mus muscu 




14 


284.4 


18. 


1 


2354 


6 


AX456524 


AX456524 Sequence 




15 


284.4 


18. 


1 


2354 


10 


AF312713 


AF312713 Mus muscu 




16 


282.8 


18. 


0 


2351 


10 


AY195872 


AY195872 Mus muscu 


c 


17 


278.6 


17. 


7 


127066 


9 


AC084265 


AC084265 Homo sapi 


c 


18 


278.6 


17. 


7 


139342 


9 


AC108476 


AC108476 Homo sapi 




19 


275.4 


17. 


5 


159346 


2 


AC145533 


AC145533 Lemur cat 


c 


20 


261. 8 


16. 


7 


207760 


2 


AC146286 


AC146286 Callicebu 




21 


244.8 


15. 


6 


4899 


9 


AF404106 


AF404106 Homo sapi 




22 


242. 8 


15. 


5 


5459 


6 


AX456521 


AX456521 Sequence 


c 


23 


241.2 


15. 


4 


2809 


9 


F351812S01 


AF351812 Homo sapi 




24 


238.8 


15. 


2 


202533 


2 


AC146464 


AC146464 Saimiri s 




25 


215 


13. 


7 


2512 


6 


AX747300 


AX747300 Sequence 




26 


215 


13. 


7 


2512 


9 


AK091997 


AK091997 Homo sapi 




27 


191.4 


12. 


2 


2258 


6 


AX320881 


AX320881 Sequence 


c 


28 


179.8 


11. 


5 


581 


9 


AF404107 


AF404107 Homo sapi 




29 


174 


11. 


1 


68166 


2 


AC084712 


AC084712 Homo sapi 




30 


173.6 


11. 


1 


2470 


10 


AF312714 


AF312714 Rattus no 


c 


31 


164 


10. 


4 


2284 


10 


AY196216 


AY196216 Mus muscu 


c 


32 


164 


10. 


4 


2285 


10 


AY196215 


AY19 6215 Mus muscu 


c 


33 


164 


10. 


4 


3674 


10 


AF324495 


AF324495 Mus muscu 





34 


151.2 


9 


. 6 


226 


6 


BD223287 


RD9 9 19 ft 7 






35 


150. 8 


9 


. 6 


235 


6 


AR121818 


AR1 ?1 81 R 






36 


146. 4 


9 


3 


1915 


6 


AX456523 


AY 4 ^ 


c 0 «i ion r*P 




37 


146.4 


9 


3 


1959 


6 


AX685729 


AY 6 ft S7? 9 




c 


38 


145. 4 


9 


3 


68166 


2 


AC084712 


APDft 4 71? 


n. cjiihj oapj. 




39 


135. 8 


8 


6 


2035 


6 


AX456526 


AY 4 S6S?fi 


OtiqiiCLH—c 




40 


107 


6 


8 


2516 


6 


AX456520 


AY 4 SfiS? 0 


C7 n rt-i ion 




41 


107 


6 


8 


2740 


9 


AF312715 


AP^I 971 S 


noruo sctpj. 




42 


101. 6 


6 


5 


249 


6 


AX320886 


AY ^? Oft ft 






43 


101 . 6 


6 


5 


2340 


6 


AX32 0883 


nvQon ft ft ^ 


sequence 




44 


101 . 6 


6. 


5 


2340 




AX685733 


AY£ R ^ ^ 


S equence 




45 


101 . 6 


6 , 


5 


2340 


9 


AF320293 


-MX j£. U/i O 


nomo odpi 




46 


93 


5. 


9 


1920 


6 


AX456519 


AX456519 


Sequence 




47 


90 


5. 


7 


122 


6 


AX320887 


AX320887 


Sequence 


c 


48 


84 


5. 


4 


4829 


10 


AF351785 


AF351785 


Rattus no 


c 


49 


67.6 


4 . 


3 


135280 


2 


AC146282 


AC146282 


Takif ugu 


c 


50 


63 


4 . 


0 


2019 


6 


AX685731 


AX685731 


Sequence 



ALIGNMENTS 



RESULT 1 
AX685737 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



AX685737 6043 bp DNA linear PAT 29-MAR-2003 

Sequence 9 from Patent WO02081691. 

AX685737 

AX685737. 1 GI : 2 937174 6 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Hobbs,H.H., Shan,B., Barnes, R. and Tian,H. 
Abcg5 and abcg8 : compositions and methods of use 
Patent: WO 02081691-A 9 17-OCT-2002; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 
(US) 



FEATURES 

source 



Location/Qualifiers 
1. .6043 

/organism="Homo sapiens" 
/mol_type= f, unassigned DNA" 
/db_xref="taxon: 9606" 

/note="ABCG8 exon 2 (reverse strand) through ABCG5 exon 2 
(forward strand)" 



ORIGIN 



Query Match 99.9%; Score 1568; DB 6; Length 6043; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1570; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3436 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 3495 



Qy 



61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 12 0 



Db 34 96 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 3555 

Qy 121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 180 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 
Db 3556 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 3 615 

Qy 181 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 24 0 

I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I 
Db 3616 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 3675 

Qy 241 AC ACT CT GGC TAAAG GT ACAT CAGATAAT G G C AT CGT T GG C C AAAT T GGT GAACT GT TAT 300 

I I I I I I M I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I | | | | 
Db 3676 ACACTCTGGCTAAAGGT ACAT CAGATAATGGCATCGTTGGCCAAATT GGT GAACT GTT AT 3735 

Qy 301 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 360 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I || | 1 I 
Db 3736 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 37 95 

Qy 361 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 420 

1 I I I I I M I I I I I I I I I I I I I I I I I I I I I I II I I I II II I I I I I I I I I I I I I I I II I I I I 
Db 3796 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 38 55 

Qy 421 CCAT GG GT GAGCT G C C C T T T CT GAGT C C AGAG G GAGC C AGAGGGC C T C AC AT CAACAGAG 4 80 

I I M I I I I I I I I I I I I I I I I I I I II I I I I M I I I II I I I I I I I I II I I I I I I I I I I I I I I 
Db 3 856 CCAT G G GT GAG CT GCCCTTTCT GAGT C C AGAGGGAG C CAGAG GG C C T CAC AT CAACAGAG 3915 

Qy 4 81 GGT CTCT GAGCT CCCTGGAGCAAGGTTCGGT CAC GGGCACAGAGGCTCGGCACAGCTT AG 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 3916 GGT CTCT GAGCT CCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCT TAG 3975 

Qy 541 GTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCT 60 0 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 3976 GTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCT 4035 

Qy 601 CTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGC 660 

II I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 036 CTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGC 4 095 

Qy 661 AGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGT GAGT C GTT CCTCCTCCTCCTCCTG 72 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 4 096 AGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGT GAGT CGT T CCTCCTCCTCCTCCTG 4155 

Qy 721 CGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCAC 780 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 4156 CGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCAC 4215 

Qy 781 TGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGG 84 0 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I 
Db 4216 TGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGG 4275 

Qy 841 CTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGAAGGAATGCAGG 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I 
Db 4276 CTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGAAGGAATGCAGG 4335 

Qy 901 GTTCACTTCj^AGAAGAAAGCAGTGTGCAGGTGTACCATCTCCCAGTCAGAGACCCAGTAA 960 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | I | | | | | || I I | 



Db 



4336 GT T C AC T T C AAGAAGAAAGC AGT GT G CAGGT GT AC CAT C T C C CAGT C AGAGAC C C AGT AA 4395 



Qy 961 TCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCC 102 0 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | 
Db 4 396 TCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCC 4455 

Qy 1021 AAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACT 108 0 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I | I I I I I I | | | | | 
Db 4456 AAG GAC AACAGAGT G GT ACATAAG GC T AAAAC AGAGT T GT C AAC C T GT C C AG GGGCAAC T 4515 

Qy 1081 GGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTG 114 0 

M I I I I I I I I I I I II M I I I M I I I I I I II I I I I I I I I I I I I I I I I II | I I I | I | | | | | | 
Db 4516 GGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTG 4575 

Qy 1141 CCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGC 1200 

I I I I I I M I M I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I | | M 
Db 4576 CCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGC 4 635 

Qy 1201 C AGC AGAAGT GG GACAG GC AAAT C C T CAAAGAT GTCTCCTT GT AC AT C GAGAGT G GC C AG 1260 

M I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I II I I ! ! I I I I I I I I I || I I I || I | | | | 
Db 463 6 CAGCAGAAGT GGGACAGGCAAAT C CT CAAAGAT GT CT CCT T GTACAT CGAGAGT GGC CAG 4 695 

Qy 1261 ATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTC 1320 

I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I II I I I I I I I II M I I II II I II I I I I I I 
Db 4696 ATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTC 4755 

Qy 1321 TAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTA 138 0 

I I M I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | 
Db 4 756 TAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTA 4 815 

Qy 1381 AGT T GT AGAGAGGC AG C CAT GC AT T T GG C AT T T GAAT ACAAT C T G GT GAC TTGTCTGGCT 1440 

I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I | | 
Db 4 816 AGT T GT AGAGAG G CAGC CAT GC AT T T G GCAT T T GAAT ACAAT CT GGT GACT T GT CT G GCT 4 875 

Qy 1441 GC C AAT AGAAC CT AGT AC CAAAGT GAAAT C T T GAGGAAAAT C CCT G GAAAGAGT G GAAAG 1500 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II II I I I I I I I I M || I I I I I I I I 
Db 4 876 GC C AAT AGAAC CT AGT AC CAAAGT GAAAT CTT GAGGAAAAT CCCT GGAAAGAGT GGAAAG 4 935 

Qy 1501 TCCTGCCTAACACGTAAGTGC CTT CTTTGCTTGTTTGATT GACT GT GAT GCT AGAGAGCA 1560 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I II I I I I I I I II II I I I I I | I I I | 
Db 4 936 TCCTGCCTAACACGTAAGTGCCTTCTTTGCTTGTTTGATTGACTGTGATGCTAGAGAGCA 4 995 

Qy 15 61 AACCCAGAGC 1570 

I I I I I I I I I I 
Db 4 9 96 AACCCAGAGC 5005 



RESULT 2 

F351786S01 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



F351786S01 

Mus musculus sterolin-1 
AF351786 

AF3517 86. 1 GI : 18 958385 



1 of 13 

Mus musculus 

Mus musculus 



1000 bp DNA linear 
(Abcg5) gene, exon 1 . 



ROD 23-AUG-2 002 



(house mouse) 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



exon 



Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 1000) 

Lu,K., Lee,M.-H., Yu,H., Zhou,Y., Sandell , S . A. , Salen,G. and 
Patel,S.B. 

Molecular cloning, genomic organization, genetic variations, and 

characterization of murine sterolin genes AbcgS and Abcg8 

J. Lipid Res. 43 (4), 565-578 (2002) 

21904563 

11907139 

2 (bases 1 to 1000) 

Lu,K., Zhou,Y., Lee, M. -H . and Patel , S . B . 
Direct Submission 

Submitted (21-FEB-2001 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 

Location/Qualif iers 

1. .1000 

/organism= M Mus mus cuius" 
/mol_type="genomic DNA" 
/strain="129/Sv M 
/db_xref="taxon: 10090" 
/ chromosome="17" 

/map="between Mit41 and Mitl89" 

/clone="329Bll" 

<359. .504 

/ gene-"Abcg5" 

/ number=l 



ORIGIN 



Query Match 61.5%; 
Best Local Similarity 99.0%; 
Matches 991; Conservative 



Score 965.4; DB 10; 
Pred. No. 2.3e-289; 
1; Mismatches 7; 



Length 1000; 
Indels 2; Gaps 



2; 



Qy 

Db 



64 GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTT 123 
I I i I I I M II I I I I I I I I I I I I I I I I I I I | M I I I I I I I I I I I I I I I I I | | | | | | | | || | 
1 GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTT 60 



QY 
Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 



124 



61 



GCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT 
I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TGCTCCTTGAGCTGGGGCACATGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT 



183 



120 



184 YTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACA 24 3 

: II M II I II I I I I I M I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
121 CTCTGGCAAACAC-TCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACA 17 9 

244 C T C T G GC TAAAGGT AC AT C AGAT AAT G G CAT C GT T G G CC AAAT T GGT GAACT GT TAT C T C 303 

I I I I I I I M I I I I I I I I I I I I I I I I I I | | | I | | | | | | | | | | | | | | | || | | | || | | | | | | | 
18 0 CTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTC 239 

304 ACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAA 363 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I | I I I I 
24 0 ACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAA 299 

364 GCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCA 423 

M M I II M f I I I M I I I I I I I I I I I I II I I I I I I I I I I I II II I I I I M I I I I I II I I I 
30 0 GCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCA 359 



Qy 424 TGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGT 4 83 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I 
Db 360 TGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGT 419 

Qy 484 CTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTG 543 

M I I I I I I I I I I I I I I I II I I I I I I || I I | || I | | | | || | | M | | | | | | | | | | | | | | | | | 
Db 420 CTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTG 479 

Qy 544 TCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTC 603 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I || I || 
Db 480 TCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTC 539 

Qy 604 -TGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAG 662 

I I M I I I I I I I I I M I M I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I M | | | | | | 
Db 540 TTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAG 599 

Qy 663 ATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCG 722 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 600 ATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCG 659 

Qy 723 CAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTG 782 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I M I I I I 
Db 660 CAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTG 719 

Qy 7 83 ATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCT 842 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I II I I I 
Db 720 ATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCT 779 

Qy 843 CGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGAAGGAATGCAGGGT 902 

M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 780 CGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGAAGGAATGCAGGGT 839 

Qy 903 T CACTT CAAGAAGAAAGCAGT GT GCAGGT GT ACCAT CT CCCAGTCAGAGAC CCAGTAAT C 962 

I I I I I I M I I I I I I M I I I II I I I I I I I I II I I I I I I I I I I I I I I I 1 II I I I I I I I I I II 
Db 840 T C AC T T C AAGAAGAAAG C AGT GT GCAGGT GT AC CAT CT C C C AGT C AGAGAC C C AGT AAT C 8 99 

Qy 963 AGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAA 1022 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 900 AGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAA 959 

Qy 1023 GGACAAC AGAGT GGT ACAT AAGGCT AAAACAGAGT T GT CAA 1063 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I M I I I I I I I I I 
Db 960 GGACAAC AGAGT GGTACATAAGGCTAAAACAGAGTT GT CAA 1000 



RESULT 3 

AC120701/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



AC120701 237445 bp DNA linear HTG 21-SEP-2002 

Rattus norvegicus clone CH230-65H6, *** SEQUENCING IN PROGRESS 
4 unordered pieces. 
AC120701 

AC120701.4 GI:23265381 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_ENRICHED . 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 



Mammalia; Eutheria; Rodent ia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

REFERENCE 1 (bases 1 to 237445) 

AUTHORS Muzny, D.Marie. , Metzker,M. Lee . , Abramzon,S., Adams, C, Alder, J., 
Allen, C, Allen, H., Alsbrooks , S . , Amin,A., Anguiano,D., 
Anyalebechi, V. , Aoyagi,A., Ayodeji,M., Baca,E., Baden, H., 
Baldwin, D . , Bandaranaike, D . , Barber, M. , Barnstead,M. , Benahmed,F., 
Biswalo,K., Blair, J., Blankenburg, K. , Blyth,P., Brown, M. , 
Bryant, N., Buhay,C, Burch,P., Burrell,K., Calderon,E., 
Cardenas, V., Carter, K., Cavazos,I., Ceasar,H., Center, A. , 
Chacko,J., Chavez, D., Chen,G., Chen,R., Chen,Y., Chen,Z., Chu,J., 
Cleveland, C. , Cockrell,R., Cox,C, Coyle,M. , Cree,A. , D'Souza,L., 
Davila,M.L., Davis, C, Davy-Carroll, L . f DeAnda,C, Dederich,D., 
Delgado,0., Denson,S., Deramo,C, Ding,Y., Dinh,H., Divya,K., 
Draper, H., Dugan-Rocha, S . , Dunn, A., Durbin,K., Duval, B., Eaves, K., 
Egan,A., Escotto,M., Eugene, C, Evans, C. A., Falls, T., Fan,G., 
Fernandez, S. , Finley,M., Flagg,N., Forbes, L., Foster, M., Foster, P., 
Fraser,C.M., Gabisi,A. , Ganta,R., Garcia, A., Garner, T., Garza, M. , 
Gebregeorgis, E. , Geer,K., Gill,R., Grady, M. , Guerra,W., Guevara, W. , 
Gunaratne, P. , Haaland,W., Hamil,C, Hamilton, C, Hamilton, K. , 
Harvey, Y. , Havlak,P., Hawes,A., Henderson, N . , Hernandez , J. , 
Hernandez, R. , Hines,S., Hladun, S . L . , Hodgson, A. , Hogues,M., 
Hollins,B., Howells,S., Hulyk,S., Hume, J., Idlebird,D., Jackson, A. , 
Jackson, L., Jacob, L., Jiang, H., Johnson, B., Johnson, R. , Jolivet,A., 
Karpathy,S., Kelly, S . , Kelly, S., Khan,Z., King,L., Kovar,C, 
Kowis,C, Kraft, C.L., Lebow,H., Levan,J., Lewis, L., Li,Z., Liu, J., 
Liu, J., Liu,W., Liu, Y. , London, P., Longacre,S., Lopez, J., 
Lorensuhewa,L. , Loulseged, H . , Lozado,R.J., Lu,X., Ma, J., 
Maheshwari,M. , Mahindartne,M. , Mahmoud,M., Malloy,K., Mangum,A., 
Mangum, B., Mapua,P., Martin, K., Martin, R., Martinez, E. , 
Mawhiney,S., McLeod,M.P., McNeill , T . Z . , Meenen,E., 
Milosavl jevic,A. , Miner, G., Minja,E., Montemayor , J. , Moore, S., 
Morgan, M. , Morris, K., Morris, S., Munidasa,M., Murphy, M. , Nair,L., 
Nankervis, C. , Neal,D., Newton, N., Nguyen, N . , Norris,S., 
Nwaokelemeh,0. , Okwuonu,G., Olarnpunsagoon, A. , Pal,S., Parks, K., 
Pasternak, S. , Paul,H., Perez, A. , Perez, L., Pf annkoch, C . , 
Plopper,F., Poindexter,A. , Popovic,D., Primus, E., Pu,L.-L., 
Puazo,M., Quiroz,J., Rachlin,E., Reeves, K. , Regier,M.A., Reign, R., 
Reilly,B., Reilly,M., Ren,Y., Reuter,M., Richards, S., Riggs,F., 
Rives, C, Rodkey,T., Rojas,A., Rose,M., Rose,R., Ruiz, S.J. , 
Sanders, W., Savery,G., Scherer,S., Scott, G. , Shatsman,S., Shen,H., 
Shetty,J., Shvartsbeyn, A. , Sisson,I., Sitter, CD., Smajs,D., 
Sneed,A., Sodergren, E . , Song,X.-Z., Sorelle,R., Sosa,J., 
Steimle,M., Strong, R. , Sutton, A. , Svatek,A., Tabor, P., Taylor, C, 
Taylor, T., Thomas, N., Thomas, S., Tingey,A., Trejos,Z., Usmani,K., 
Valas,R., Vera,V., Villasana, D . , Waldron,L., Walker, B., Wang, J., 
Wang,Q., Wang,S., Warren, J., Warren, R. , Wei,X., White, F., 
Williams, G., Willson,R., Wleczyk,R., Wooden, H., Worley,K., 
Wright, D., Wright, R. , Wu,J., Yakub,S., Yen, J., Yoon,L., Yoon,V., 
Yu,F., Zhang, J., Zhou, J., Zhou,X., Zhao,S., Dunn,D., von 
Niederhausern,A. , Weiss, R. , Smith, D.R., Holt, R. A., Smith, H.O., 
Weinstock,G. and Gibbs,R.A. 

TITLE Direct Submission 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 237445) 

AUTHORS Worley , K . C . 

TITLE Direct Submission 



JOURNAL Submitted ( 09-MAY-2002 ) Human Genome Sequencing Center, Department 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Baylor College of Medicine, One 
USA 



of Molecular and Human Genetics, 
Baylor Plaza, Houston, TX 77030, 
3 (bases 1 to 237445) 
Rat Genome Sequencing Consortium. 
Direct Submission 

Submitted (21-SEP-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Sep 21, 2002 this sequence version replaced gi:21908396. 
The sequence in this assembly is a combination of BAC based reads 
and whole genome shotgun sequening reads assembled using Atlas 
(http://www.hgsc.bcm.tmc.edu/projects/rat/). As a result, the 
sequence may extend beyond the ends of the clone and there may be 
contigs that consist entirely of whole genome shotgun sequence 
reads. Both end sequences and whole genome shotgun sequence only 
contigs will be indicated in the feature table. 
Genome Center 

Center: Baylor College of Medicine 

Center code : BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact : hgsc-help@bcm. tmc.edu 
Project Information 

Center project name: GXQV 

Center clone name: CH230-65H6 
Summary Statistics 

Assembly program: Phrap; version 0, 

Consensus quality: 209781 bases at 

Consensus quality: 213033 bases at 

Consensus quality: 214997 bases at 

Estimated insert size: 233017; sum-of-contigs estimation 
Quality coverage: 4x in Q20 bases; sum-of-contigs estimation 



FEATURES 

source 



,990329 
least Q40 
least Q30 
least Q20 



misc feature 



NOTE: Estimated insert size may differ from sequence length 

(see http : //www. hgsc . bcm. tmc . edu/docs/Genbank_draf t_data . html ) 
NOTE: This is a 'working draft' sequence. It currently 
consists of 4 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

1 233866: contig of 233866 bp in length 

233867 233966: gap of unknown length 

233967 235011: contig of 1045 bp in length 

235012 235111: gap of unknown length 

235112 236137: contig of 1026 bp in length 

236138 236237: gap of unknown length 

236238 237445: contig of 1208 bp in length. 
Location/Qualifiers 
1. .237445 

/organism="Rattus norvegicus" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 10116" 
/clone="CH230-65H6" 
1. .1326 



/ note="wgs_end__extension 
clone_end : T7 " 
misc_feature 8065. .8944 

/note="clone_boundary 
clone_end: T7 
site : EcoRI 

end_sequence: BH350813" 
misc_f eature complement (232953 . .233569) 
/note="clone_boundary 
clone_end: Sp6 
site : EcoRI 

end_sequence: BH350815" 

ORIGIN 



Query Match 55.3%; Score 868.6; DB 2; Length 237445; 

Best Local Similarity 81.6%; Pred. No. 4.9e-259; 

Matches 1181; Conservative 3; Mismatches 197; Indels 66; Gaps 13 

Qy 2 GAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCC 61 

I M I I I I I I I I I I I I II I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 137687 GAAGCATCCTGGAGTACAGTCCCGTTCCACAGCTGGGTCTCCTCTTTGGTCTTCTCAGCC 

137628 



Qy 62 ATGACC AGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGG 10 9 

I I I I I I I I I I I I I || I I I I I | | | | | || | | | | | | | | | | 

Db 137 627 ATGACCTGCGGTGTTGTGCCCTTTGTGTGGCTCCTGAGGCCTCCCCTGCTGTTGGCTAGG 

137568 



Qy 110 CTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTG 160 

III I I I I II I I I I II I I I I I I II I I I I I I I I II I I I I I I II II 
Db 137567 CCAGGATTCTTTCTGTCTTTGCTCCTTAGAGCTAGGGCACTTGAGTCCTCCTTCCTGGCA 

137508 



Qy 161 CCAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTC 220 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 137507 CCAGCCTTTCTCCCAGCATTCCTCTCTGGCT^AGC-CCTCCTATAAACACACTGTGTGTTC 

137449 



Qy 221 T G C CT AT T GT C GAGATAAGGACACT C T G GCTAAAGGT ACAT C AGATAAT G GC AT C GT T GG 280 

I I I II I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I M M I I I I I I 
Db 137 448 T G C C TAT T GT C GAGAT AAGGAC AC T CT GG CT AAAG GT AC AT C AGATAAT G GCAT C GTT G G 

137389 



Qy 281 CCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCA 34 0 

I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 13738 8 C CAAAT C G GT GAACT GT T GT CT CAC GAG GAC TCTCGGGCT GGAT AGGAT CT GAC AG GGC A 

137329 



Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 400 

M I I II I I I I I I I I I I I I I I I I I I I I I I | | | I I I I III III I II I I I 

Db 137328 CT C C CAT TGGCTCCT C AGT T AAAGT T G C T C T GAAG C CAGAC AG GAC AC C AGAG GAT T CAC 

137269 



Qy 4 01 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 4 60 

I M M I I I I I I I I M M I I I I I I I M I I I I I I I I I I I I I I I I I I I I | | | | I I I I I 
Db 137268 TCACATTTGCTTCCCGCTGGCCATGAGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 

137209 



Qy 461 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 

I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II II II | M I 
Db 137208 AGGGCCTCACAACAACAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTC 

137149 



Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCT 580 

1 I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I II I I I I 
Db 137148 AGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGGTAAGGGGACC- 

137090 



Qy 581 CCACAGCAAAAAGCTAGGCTCTC TGATTGCCTTTTCTGAATGGGTGG 627 

I I I I I I I I I I I I I I I I I I I || | | I I I | I I I 

Db 13708 9 CCACAGCGAAGAGCTAGGCTTCCCACCCTATCTGATGCCTTTTCACACCAAGGTGGGTGG 

137030 



Qy 628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

I I I I I I I I I I I M I I I II II II I I I I I I I I I I I I I I I I II I I I I I II I I I I I || 
Db 13702 9 GTGGGCCTGTGGGCTTTGGGCTGCCTGTCTAGCAGATCAGGGTGGAAGTGGACAGTTCGT 

136970 



Qy 68 8 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 7 47 

I M I I I I I I I I I I I I I I I I I I I I I III Mill I I I I I I II I I I I I 

Db 136969 TGCAACAGTGAGTGG CTCCTCCCCCTGCCCAGAGCAGATCCTGAACATTGAAAC 

136916 



Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCT-CCCCTTCCTTGAC 8 06 

I I II I I I I I I I I I I I II II I I I I I I I II I I I II I I I I I I I I Ml I I I I I I I | 
Db 136915 ACACCCTGCCTGAAGCCGC-TGCTGCTTCTCATAGATTTCTGCTCTACCCTTTCCTTGGC 

136857 



Qy 8 07 TCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCA 8 66 

I I Ml I I I I I I Ml I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I III II 
Db 136856 TGGTCCATCACCTGCCCTCTGTAGATGGAGAAGGCTTGGGAAGTGGGGGTGCT-GGGACA 

136798 



Qy 8 67 CAAAATGGAATGAACACTGCTGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTG 926 

Ml I I I I I I I I I I III II I I I I I I I I I II I II I I I II I I I I I I I I 
Db 1367 97 CAAGGT G GAAT GAAC C C T GAT GGAGGAAT G C AG G GT T CAC C T C - AGAAT AAAGT GT AC AT 

136739 



Qy 927 C AGGT GT AC CAT CT C C CAGT C AGAGAC C C AGT AAT CAGAGC AG C T AAT G G GAG G CAT GC T 98 6 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I || I II I I I I I I I 
Db 13 6738 GT T AC CAT CT CAC AG C CAGACAGAGAT C CAGT AAT C AGAG CAG C CAAAG G GAG G C AC GT T 

136679 



Qy 9 87 CCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAAGGC 104 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I 
Db 13667 8 CCTTGGGTGGTGGCCAACTTGTCATTACACCTCCAAGGACCACATAGTGTGATGCAAGGC 

136619 



Qy 104 7 TAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAG 1101 

I Ml I I M I I I I II I II I I I I I I I I I I I I I I I I I I II I II I I I 
Db 13 6618 TGAAATAGAGTTGTCATCTTGCACAGGAGGACCTGGGATGGGGTTGGTCTGGGTGTGGGG 

136559 



Qy 



1102 



CAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATT 1153 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 

Db 136558 CTGGGAAACAGGGGTCTGGCACCTTCAAGGGTCCTACTCTGCCTTTTGTTCATGTGGGAT 

136499 

Qy 1154 TCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATC7WITCATGCCAGCAGAAGTGGG 1213 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I | I I I 
Db 136498 TTCTTTAAAGCAACCGTGTCGGGCCCTGGTGGAACATCAAATCATGCCAGCAGAAGTGGG 

136439 

Qy 1214 AC AGGCAAAT C C T CAAAGAT GTCTCCTT GT ACAT C GAGAGT G G C C AGAT TAT GT G CAT CT 1273 

I I I I I I I I I I I I I I I I II I I I II I I I II I I II I I I I I I I I I I II I I I I I I I I I M I I 
Db 136438 ACAG GAAAAT C CT CAAAGAT GT CT C C T T GT ACAT CGAGAGT GGC CAGAC CAT GT GCAT C T 

136379 

Qy 1274 TAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCT 1333 

I I II I I I I I I I I I I I I I I I I I : III I I I I I I I I I I I I II I I I I I II 

Db 13637 8 TAGGTAGCTCAGGTAAGCGCCT CGAGGGGTCCTGCACTTGTAAGGCAGACTCT 

136326 

Qy 1334 GGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTAAGTTGTAGAGAGG 13 93 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I II I 

Db 136325 GGGAGGCTGGGGCTCGGTCTAAGCTCGGTGTTTAAGAAATGAGTTTAATTGGGAGGGGAA 

136266 

Qy 1394 CAGCCAT 1400 

II I I I I 

Db 136265 CACCCAT 136259 



RESULT 4 
AC112747 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC112747 312858 bp DNA linear HTG 08-OCT-2002 

Rattus norvegicus clone CH230-359E1, *** SEQUENCING IN PROGRESS 

8 unordered pieces. 
AC112747 

AC112747.3 GI : 23270105 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_ENRICHED . 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 312858) 

Muzny, D.Marie. , Metzker, M. Lee . , Abramzon,S., Adams, C, Alder, J. , 
Allen, C, Allen, H., Alsbrooks , S . , Amin,A., Anguiano,D., 
Anyalebechi, V. , Aoyagi,A., Ayodeji,M., Baca,E., Baden, H., 
Baldwin, D., Bandaranaike, D . , Barber, M. , Barnstead, M. , Benahmed,F., 
Biswalo,K., Blair, J., Blankenburg, K. , Blyth,P., Brown, M., 
Bryant, N., Buhay,C, Burch,P,, Burrell,K., Calderon,E. f 
Cardenas, V., Carter, K,, Cavazos,I., Ceasar,H., Center, A., 
Chacko,J., Chavez, D., Chen,G., Chen,R., Chen,Y., Chen,Z., Chu,J., 
Cleveland, C. , Cockrell,R., Cox,C, Coyle,M., Cree,A. f D'Souza,L., 
Davila,M.L., Davis, C, Davy-Carroll, L. , De Anda,C, Dederich,D., 
Delgado,0., Denson,S., Deramo,C, Ding,Y., Dinh,H., Divya,K., 
Draper, H., Dugan-Rocha, S . , Dunn, A. , Durbin,K., Duval, B., Eaves, K., 
Egan,A., Escotto,M., Eugene, C, Evans, C. A., Falls, T., Fan,G., 
Fernandez, S . , Finley,M., Flagg,N., Forbes , L . , Foster, M. , Foster, P., 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Fraser,C.M., Gabisi,A., Ganta,R., Garcia, A. , Garner, T. f Garza, M. , 
Gebregeorgis , E. , Geer,K., Gill,R., Grady, M. , Guerra,W. , Guevara, W. , 
Gunaratne, P. , Haaland,W., Hamil,C, Hamilton, C . , Hamilton, K., 
Harvey, Y. , Havlak,P., Hawes,A. , Henderson, N . , Hernandez , J . , 
Hernandez , R. , Hines,S., Hladun,S.L., Hodgson, A. , Hogues,M., 
Hollins,B., Howells,S., Hulyk,S., Hume, J., Idlebird,D., Jackson, A. , 
Jackson, L., Jacob, L., Jiang, H., Johnson, B., Johnson, R. , Jolivet,A., 
Karpathy,S., Kelly, S., Kelly, S., Khan, Z . , King,L., Kovar,C, 
Kowis,C, Kraft, C.L., Lebow,H., Levan,J., Lewis, L., Li,Z., Liu, J., 
Liu, J., Liu,W., Liu, Y. , London, P., Longacre,S., Lopez, J., 
Lorensuhewa, L . , Loulseged, H . , Lozado,R.J., Lu,X., Ma, J., 
Maheshwari,M. , Mahindartne,M. , Mahmoud,M., Malloy,K., Mangum,A. , 
Mangum,B., Mapua,P., Martin, K. , Martin, R., Martinez, E., 
Mawhiney,S., McLeod,M.P., McNeill , T . Z . , Meenen,E., 
Milosavl j evic, A. , Miner, G., Minja,E., Montemayor, J. , Moore, S., 
Morgan, M. , Morris, K., Morris, S., Munidasa,M., Murphy, M. , Nair,L., 
Nankervis , C . , Neal,D., Newton, N., Nguyen, N., Norris,S., 
Nwaokelemeh, 0 . , Okwuonu,G., Olarnpunsagoon, A. , Pal,S., Parks, K., 
Pasternak, S . , Paul,H., Perez, A., Perez, L., Pf annkoch, C . , 
Plopper,F., Poindexter, A. , Popovic,D., Primus, E., Pu,L.-L., 
Puazo,M., Quiroz,J,, Rachlin,E., Reeves, K. , Regier,M.A., Reigh,R., 
Reilly,B., Reilly,M. , Ren,Y., Reuter,M. , Richards, S., Riggs,F., 
Rives, C, Rodkey,T., Rojas,A., Rose,M. , Rose,R., Ruiz, S. J., 
Sanders, W., Savery,G., Scherer,S., Scott, G., Shatsman,S., Shen,H., 
Shetty,J., Shvartsbeyn, A. , Sisson,I., Sitter, CD., Smajs,D., 
Sneed,A. , Sodergren, E . , Song,X.-Z., Sorelle,R., Sosa,J., 
Steimle,M., Strong, R., Sutton, A., Svatek,A., Tabor, P., Taylor, C, 
Taylor, T., Thomas, N., Thomas, S., Tingey,A. , Trejos,Z., Usmani,K., 
Valas,R,, Vera,V., Villasana, D . , Waldron,L., Walker, B., Wang, J., 
Wang,Q., Wang,S., Warren, J., Warren, R. , Wei,X., White, F., 
Williams, G., Willson,R., Wleczyk,R., Wooden, H. , Worley,K., 
Wright, D., Wright, R., Wu,J., Yakub,S., Yen, J., Yoon,L., Yoon,V. , 
Yu,F., Zhang, J., Zhou, J., Zhou,X., Zhao,S., Dunn,D., von 
Niederhausern, A. , Weiss, R. , Smith, D.R., Holt, R. A., Smith, H.O., 
Weinstock,G. and Gibbs,R.A. 
Direct Submission 
Unpublished 

2 (bases 1 to 312858) 
Worley,K.C. 

Direct Submission 

Submitted (2 4-FEB-2 0 02 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

3 (bases 1 to 312858) 

Rat Genome Sequencing Consortium. 
Direct Submission 

Submitted ( 08-OCT-2 002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Sep 23, 2002 this sequence version replaced gi:21738477. 
The sequence in this assembly is a combination of BAC based reads 
and whole genome shotgun sequencing reads assembled using Atlas 
(http://www.hgsc.bcm.tmc.edu/projects/rat/). Each contig described 
in the feature table below represents a scaffold in the Atlas 
assembly (a ' contig-scaf f old 1 ) . Within each contig-scaf f old, 
individual sequence contigs are ordered and oriented, and separated 
by sized gaps filled with Ns to the estimated size. The sequence 



may extend beyond the ends of the clone and there may be sequence 
contigs within a contig-scaf f old that consist entirely of whole 
genome shotgun sequence reads. Both end sequences and whole genome 
shotgun sequence only contigs will be indicated in the feature 
table . 

Genome Center 

Center: Baylor College of Medicine 
Center code: BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact: hgsc-help@bcm.tmc.edu 
Project Information 

Center pro j ect name : GRAX 

Center clone name: CH230-359E1 
Summary Statistics 

Assembly program: Phrap; version 0.990329 

Consensus quality: 241372 bases at least Q40 

Consensus quality: 245333 bases at least Q30 

Consensus quality: 248022 bases at least Q20 

Estimated insert size: 276767; sum-of-contigs estimation 

Quality coverage: 4x in Q20 bases; sum-of-contigs estimation 



* NOTE: Estimated insert size may differ from sequence length 

* (see http : //www. hgsc . bcm. tmc . edu/docs/Genbank_draf t_data . html ) 

* NOTE: This sequence may represent more than one clone. 

* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 8 contigs. The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 

* be preserved. 

* 1 155105: contig of 155105 bp in length 

* 155106 155205: gap of unknown length 

* 155206 221765: contig of 66560 bp in length 

* 221766 221865: gap of unknown length 

* 221866 290378: contig of 68513 bp in length 

* 290379 290478: gap of unknown length 

* 290479 293724: contig of 3246 bp in length 

* 293725 293824: gap of unknown length 

* 293825 305790: contig of 11966 bp in length 

* 305791 305890: gap of unknown length 

* 305891 307341: contig of 1451 bp in length 

* 307342 307441: gap of unknown length 

* 307442 309768: contig of 2327 bp in length 

* 309769 309868: gap of unknown length 

* 309869 312858: contig of 2990 bp in length. 
FEATURES Location/Qualifiers 

source 1. .312858 

/organism="Rattus norvegicus" 

/mol_type =,t genomic DNA" 

/db_xref="taxon: 10116" 

/ cl one= " CH2 3 0 - 3 5 9 E 1 " 
misc_feature 159838. .161520 

/note~"wgs_contig" 
misc_feature 166727. .168287 

/note="wgs contig" 



misc_feature 190162. .191648 

/note= ,, wgs_contig n 
misc_feature 234118. .235251 

/note="wgs_contig" 
misc_feature 290479. .292119 

/note="wgs_contig" 

ORIGIN 

Query Match 55.3%; Score 868.6; DB 2; Length 312858; 

Best Local Similarity 81.6%; Pred. No. 4.9e-259; 

Matches 1181; Conservative 3; Mismatches 197; Indels 66; Gaps 13; 

Qy 2 GAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCC 61 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I 
Db 91501 GAAGCATCCTGGAGTACAGTCCCGTTCCACAGCTGGGTCTCCTCTTTGGTCTTCTCAGCC 91560 

Qy 62 ATGACC AGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGG 109 

I I I I I I I I I I I I I I II I I I I I I I I I I II I II II I I I I 

Db 91561 ATGACCTGCGGTGTTGTGCCCTTTGTGTGGCTCCTGAGGCCTCCCCTGCTGTTGGCTAGG 91620 

Qy 110 CTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTG 160 

II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II 
Db 91621 CCAGGATTCTTTCTGTCTTTGCTCCTTAGAGCTAGGGCACTTGAGTCCTCCTTCCTGGCA 91680 

Qy 161 CCAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTC 220 

I I I I I I I I I I I II I I I I I I I I II : I II I I I I I I I I II II I I I I I II I I I I I II I I I 
Db 91681 CCAGCCTTTCTCCCAGCATTCCTCTCTGGCAAGC-CCTCCTAT7WVCACACTGTGTGTTC 91739 

Qy 221 T G C CT AT T GT C GAG AT AAG GACAC T C T G G CT AAAGGT AC AT C AGAT AAT G GC AT C GT T G G 280 

I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I 
Db 91740 T GC CT AT T GT C GAGATAAG GAC ACT CT G GCT AAAG GT AC AT C AGAT AAT G GCAT C GT T G G 91799 

Qy 281 C CAAAT T GGT GAACT GT TAT CT C AC GAGGAT T C C AGG G CT G G GT AG GAT C GGACAGGGC A 34 0 

I I I I I I I I I I II II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I 
Db 91800 CCAAATCGGTGAACTGTTGTCTCACGAGGACTCTCGGGCTGGATAGGATCTGACAGGGCA 9185 9 

Qy 341 CT C C CAT TGGCTCCT C AGT T AAAG CT G C C CT G GAGC C G GAC AGGC C ACT AGAAAAT T C AC 4 00 

I I I I I I I I I I I I I I I I I I I I I I I I III III I I II I I I I I I III III II I I I I 
Db 91860 CTCCCATTGGCTCCTCAGTTAAAGTTGCTCTGAAGCCAGACAGGACACCAGAGGATTCAC 91919 

Qy 401 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 460 

I I I I II I II I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 91920 TCACATTTGCTTCCCGCTGGCCATGAGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 91979 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 520 

I I I I I I I I I II II I I I I I I II I I I I I I I II I I I I I I I I I I I I II M M Ml I 

Db 91980 AGGGCCTCACAACAACAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTC 92039 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCT 580 

II I I I I I I I I I I I II I I I I I I I I I I I II I I I II I I I I I II I I I I I I I I I I I I I I M I 

Db 92040 AGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGGTAAGGGGACC- 92098 

Qy 581 CCACAGCAAAAAGCT AGGCT CT C TGATTGCCTTTTCTGAATGGGTGG 62 7 

I I I I I I I I I I I I I I I II I I II I I I I I I I II 

Db 92 099 CCACAGCGAAGAGCTAGGCTTCCCACCCTATCTGATGCCTTTTCACACCAAGGTGGGTGG 92158 



Qy 



628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

Db 92159 GTGGGCCTGTGGGCTTTGGGCTGCCTGTCTAGCAGATCAGGGTGGAAGTGGACAGTTCGT 92218 

Qy 688 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 7 47 

I I I I I I I I I I I 1 I I I I I I I I I I I I III Mill I I I I I I I I I I I I I 

Db 92219 TGCAACAGTGAGTGG CTCCTCCCCCTGCCCAGAGCAGATCCTGAACATTGAAAC 92272 

Qy 74 8 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCT-CCCCTTCCTTGAC 8 06 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I 
Db 92273 ACACCCTGCCTGAAGCCGC-TGCTGCTTCTCATAGATTTCTGCTCTACCCTTTCCTTGGC 92331 

Qy 807 TCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCA 8 66 

I I III I I I I I I III I I I I I I I I I I M I I I I I II I I I I I I I I I I I I Ml II 
Db 92332 TGGTCCATCACCTGCCCTCTGTAGATGGAGAAGGCTTGGGAAGTGGGGGTGCT-GGGACA 92390 

Qy 8 67 CAAAAT GGAAT GAACACT GCT GAAG GAAT G C AG G GT T CAC T T CAAGAAGAAAGCAGT GT G 92 6 

III I I I I I I I I I I III II I I I I I I I I I I II I II I I II II I I MM 
Db 92391 CAAG GT GGAAT GAAC C CT GAT GGAG GAAT G CAG G GT T CAC C T C - AGAAT AAAGT GT AC AT 92449 

Qy 927 CAG GT GT AC CAT C T C C C AGT C AGAGAC C CAGT AAT C AGAG CAG C T AAT GGGAG G CAT GCT 986 

I I II II II I I I I M II I I I I I I I I I I I II I I II I I II I I I I 
Db 92 4 50 GT T AC CAT CT C ACAG C C AGAC AGAGAT C CAGT AAT C AGAG C AGC CAAAGGG AG G CAC GT T 9250 9 

Qy 987 CCTTGGGTGGTGGC C AAC T T GT CAT T AT AC CT C CAAG GACAACAGAGT G GT AC AT AAG G C 1046 

I M M II I II II M I I I I I I I II I I I I II II I I I I II I I III MM I I I I I I 
Db 92510 CCTTGGGTGGTGGC CAACT T GT CAT T AC AC CT C CAAG GAC CACAT AGT GT GAT GCAAG G C 92569 

Qy 1047 TAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAG 1101 

I III I I II I II I I I I II I I II I I II I I II I I I II I II I I I II I 
Db 92 57 0 TGAAATAGAGTTGTCATCTTGCACAGGAGGACCTGGGATGGGGTTGGTCTGGGTGTGGGG 92 62 9 

Qy 1102 CAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATT 1153 

I II II M I I II II I I I I I II I II I II I I I I II I I I II I 

Db 92630 CTGGGAAACAGGGGTCTGGCACCTTCAAGGGTCCTACTCTGCCTTTTGTTCATGTGGGAT 92689 

Qy 1154 TCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGG 1213 

I M II M I II II I II I II II I I M I I II II II I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 92690 TTCTTTAAAGCAACCGTGTCGGGCCCTGGTGGAACATCAAATCATGCCAGCAGAAGTGGG 92749 

Qy 1214 AC AGG CAAAT C CT CAAAGAT GT CT C C T T GT ACAT CGAGAGT G GC CAGAT TAT GT GCAT CT 1273 

I II I I II I I II II II I I I II I II II I I II II II I I I I I I I II I I I I I I II I I I I II I 
Db 92750 ACAGGAAAAT C CT CAAAGAT GTCT CCTT GT ACAT CGAGAGT GGC CAGAC CAT GT GCAT CT 92809 

Qy 1274 TAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCT 1333 

I I I I II I I II I II I II I II I I : Ml II I II I II I I I I I II I I I II I 

Db 92810 TAGGTAGCTCAGGTAAGCGCCT CGAGGGGTCCTGCACTTGTAAGGCAGACTCT 92862 

Qy 1334 GGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTAAGTTGTAGAGAGG 1393 

I II I II I I II II I I II I I I I I I I I II I I I I II I : I I II I I I I I II I 

Db 92 8 63 GGGAGGCTGGGGCTCGGTCTAAGCTCGGTGTTTAAGAAATGAGTTTAATTGGGAGGGGAA 92922 

Qy 1394 CAG C CAT 14 00 

II II II 

Db 92923 CACCCAT 92929 



RESULT 5 



AY145899 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



gene 
mRNA 
11041, 

CDS 
,11041, 



AY145899 40929 bp DNA linear ROD 12-NOV-2002 

Rattus norvegicus sterolin 2 (Abcg8) and sterolin 1 (AbcgS) genes, 
complete cds . 
AY145899 

AY145899.1 GI:24935208 

Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 40929) 

Yu,H., Lu,K., Lee,M., Pandit, B. and Patel,s.B. 

The rat Abcg5 and Abcg8: characterization, chromosomal assignment 

and genetic variation in sitosterolemic rats 

Unpublished 

2 (bases 1 to 40929) 

Yu,H., Lu,K., Lee,M., Pandit, B. and Patel,s.B. 
Direct Submission 

Submitted (29-AUG-2002 ) Endocrinology, Diabetes and Medical 
Genetics, Medical University of South Carolina, 114 Doughty Street, 
STR 541, Charleston, SC 29403, USA 

Location/ Qualifiers 

1. .40929 

/organism="Rattus norvegicus" 
/mol_type=" genomic DNA" 
/ strain="Sprague-Dawley" 
/db_xref="taxon: 10116" 
complement (<4136. .>20831) 
/gene= M Abcg8" 

complement (join (<4136. . 4273, 4 361 . . 44 88, 5 693 . .59 60, 
6513. .6589,6754. .6953,8189. .8269,8350. .8512,10772. 

11129. .11261,11647. .11885,15513. .15669,17473. .17574, 
20769. .>20831)) 
/gene="Abcg8" 
/product="sterolin 2" 

complement (join (4136. .42 7 3,4361. . 44 8 8,5693. . 5960, 
6513. .6589,6754. .6953,8189. .8269,8350. .8512,10772. 

11129. .11261,11647. .11885,15513. .15669,17473. .17574, 

20769. .20831) ) 

/gene="Abcg8" 

/note="ATP-binding cassette sub-family G (WHITE) member 8" 
/ codon__start=l 
/product="sterolin 2" 
/protein_id="AAN64276. 1" 
/db_xref="GI :24935210" 

/translation="MAEKTKEETQLWNGTVLQDASSLQDSVFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDMASQVPWFEQLAQFKLPWRSRGSQDSWDLGIRNLSFKVRSGQMLA 
IIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQPSTPQLIQKCVAHVRQQDQLLPN 
LTVRETLTFIAQMRLPKTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPI YLGVAQHMVQYFTSIGYPCPRYSNPADFYVDLTSIDRRSKEQ 
EVATMEKARLLAALFLEKVQGFDDFLWKAEAKSLDTGTYAVSQTLTQDTNCGTAAELP 
GMIQQFTTLIRRQISNDFRDLPTLFIHGAEACLMSLIIGFLYYGHADKPLSFMDMAAL 



LFMIGALIPFNVILDWSKCHSERSLLYYELEDGLYTAGPYFFAKVLGELPEHCAYVI 
IYGMPIYWLTNLRPGPELFLLHFMLLWLVVFC^ 

NSFYLTAGFMINLNNLWIVPAWISKMSFLRWCFSGLMQIQFNGHIYTTQIGNLTFSVP 
GDAMVTAMDLNSHPLYAIYLIVIGISCGFLSLYYLSLKFIKQKSIQDW M 
gene <21211. .>40564 

/gene= ,r Abcg5" 

mRNA join«21211. .21356,21968. .2208 9,24726. .24862,24 94 9. 



.25047, 



27388. .27520,28838. .28977,29879. .30008,30715. .30928, 
31032. .31237,32869. .33007,35821. .36006,38553. .38665, 
40371. .>40564) 
/gene="Abcg5" 
/product="sterolin 1" 
CDS join{21211. .21356,21968. .22089,24726. .24 8 62,24949. 



.25047, 



ORIGIN 



27388. .27520,28838. .28977,29879. .30008,30715. .30928, 
31032. .31237,32869. .33007,35821. .36006,38553. .38665, 
40371. .40564) 
/gene="Abcg5" 

/note="ATP-binding cassette sub-family G (WHITE) member 5" 

/codon_start-l 

/product="sterolin 1" 

/protein_id= M AAN64275. 1" 

/db_xref="GI: 24935209" 

/ trans lation="MSELPFLSPEGARGPHNNRGSQSSLEEGSVTGSEARHSLGVLNV 
S FS VSNRVGPWWNI KSCQQKWDRKI LKDVSLYI ESGQTMCI LGS SGSGKTTLLDAI SG 
RLRRTGTLEGEVFVNGCELRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMLALRSS 
SADFYDKKVEAVLTELSLSHVADQMI GNYNFGGI SSGERRRVS IAAQLLQDPKVMMLD 
EPTTGLDCMTANHIVLLLVELARRNRIVIVTIHQPRSELFHHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLESAFRQSD 
I CHKI LENI ERTRHLKTLPMVP FKTKNP PGMFCKLGVLLRRVTRNLMRNKQWIMRLV 
QNLIMGLFLIFYLLRVQNNMLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYQKWQMLLAYVLHALPFSIVATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGMVQNPNIVNSIVALLSISGLLIGSGFIRNIEEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSVPNNPMCSMTQGIQFIEKTCPGATSRFTTN 
FLI LYS FI PTLVILGMWFKVRD YLI SR" 



Query Match 55.3%; Score 868.2; DB 10; Length 40929; 

Best Local Similarity 81.6%; Pred. No. 6e-259; 

Matches 1181; Conservative 3; Mismatches 196; Indels 67; Gaps 13; 

Qy 2 GAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCC 61 

I I ! I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 20770 GAAGCATCCTGGAGTACAGTCCCGTTCCACAGCTGGGTCTCCTCTTTGGTCTTCTCAGCC 20829 

Qy 62 ATGACC AGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGG 10 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 20830 ATGACCTGCGGTGTTGTGCCCTTTGTGTGGCTCCTGAGGCCTCCCCTGCTGTTGGCTAGG 2088 9 

Qy 110 CTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTG 160 

III I I I II I I 1 I I II I I I II I I II I I I I I I I I I I I I I I I I I II 
Db 2 08 90 CCAGGATTCTTTCTGTCTTTGCTCCTTAGAGCTAGGGCACTTGAGTCCTCCTTCCTGGCA 20949 

Qy 161 CCAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTC 22 0 

I I II I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 20950 CCAGCCTTTCTCCCAGCATTCCTCTCTGGCAAGC-CCTCCTATAAACACACTGTGTGTTC 21008 



Qy 221 T GC CTATT GTC GAGATAAGGACACT CT GGCTAAAGGT ACATCAGATAAT GGCAT CGTT GG 280 

I I I I II I II I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I t I I I I I 
Db 21009 T G C C TAT T GT C GAGATAAG GAC AC T CT G GC T AAAGGT AC AT C AGAT AAT G G CAT C GT T GG 21068 

Qy 2 81 C CAAAT T G GT GAAC T GT TAT C T CAC GAG GAT T C C AG G G C T GG GT AG GAT C GGAC AGG GC A 340 

I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I II 
Db 21069 CCAAATCGGTGAACTGTTGTCTCACGAGGACTCTCGGGCTGGATAGGATCTGACAGGGCA 21128 

Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 400 

I I I I I I I I I I I I I I I I I I I II I I I III III I I I I I I I I I I III III I I | I I I 
Db 21129 CT C C CAT TGGCTCCT C AGT TAAAGT T G C T CT GAAGC C AGAC AGGAC AC C AGAGGAT T CAC 21188 

Qy 401 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 460 

I I I I I I I I I I I I III I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2118 9 TCACATTTGCTTCCCGCTGGCCATGAGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 212 4 8 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II III I 
Db 2124 9 AGGGCCTCACAACAACAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTC 213 08 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCT 580 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 213 0 9 AGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGGTAAGGG — ACC 21366 

Qy 581 CCACAGCAAAAAGCTAGGCTCTC TGATTGCCTTTTCTGAATGGGTGG 627 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

Db 21367 CCACAGCGAAGAGCTAGGCTTCCCACCCTATCTGATGCCTTTTCACACCAAGGTGGGTGG 21426 

Qy 62 8 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 68 7 

I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II 
Db 214 27 GTGGGCCTGTGGGCTTTGGGCTGCCTGTCTAGCAGATCAGGGTGGAAGTGGACAGTTCGT 214 8 6 

Qy 688 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I 

Db 21487 T GCAACAGTGAGT GG CTCCTCCCCCTGCCCAGAGCAGATCCTGAACATTGAAAC 21540 

Qy 74 8 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCT-CCCCTTCCTTGAC 806 

I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I III I I I I I I I I 
Db 21541 ACACCCTGCCTGAAGCCGC-TGCTGCTTCTCATAGATTTCTGCTCTACCCTTTCCTTGGC 21599 

Qy 8 07 TCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCA 8 66 

I I Ml I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I III II 
Db 21600 TGGTCCATCACCTGCCCTCTGTAGATGGAGAAGGCTTGGGAAGTGGGGGTGCT-GGGACA 21658 

Qy 8 67 C AAAAT GGAAT GAAC ACT G CT GAAG GAAT G CAG GGT T CAC T T C AAGAAGAAAGC AGT GT G 92 6 

III II I I I I I I I I III II I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 21659 CAAGGT GGAAT GAACC CT GAT GGAGGAAT GCAGGGTT CAC CT C - AGAATAAAGT ATACAT 21717 

Qy 92 7 CAGGT GTACCAT CT CCCAGT CAGAGACCCAGTAAT CAGAGCAGCTAAT GGGAGGCAT GCT 986 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 1 I II I I I I I I 
Db 21718 GT T AC CAT CT C AC AG C CAG AC AG AG AT C CAGTAAT CAGAG CAG C C AAAG G GAGG CAC GT T 21777 

Qy 987 CCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAAGGC 104 6 

I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I 
Db 2177 8 CCTTGGGTGGTGGCCAACTTGTCATTACACCTCCAAGGACCACATAGTGTGATGCAAGGC 218 37 



Qy 1047 TAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAG 1101 

I III I I I I I I I II I I II I I I I I I I I I I I I I I I I I II Mill I 
Db 21838 TGAAATAGAGTTGTCATCTTGCACAGGAGGACCTGGAATGGGGTTGGTCTGGGTGTGGGG 21897 

Qy 1102 CAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATT 1153 

I I I I I I I I I I I I I I I I I 1 II I I I I I II I I I I I I I I I I I I I 

Db 21898 CTGGGAAACAGGGGTCTGGCACCTTCAAGGGTCCTACTCTGCCTTTTGTTCATGTGGATT 21957 

Qy 1154 TCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGG 1213 

I I I I I II I I I I II I II I I I I I I I I I II I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 21958 TCCTTTAAAGCAACCGTGTCGGGCCCTGGTGGAACATCAAATCATGCCAGCAGAAGTGGG 22017 

Qy 1214 ACAGG CAAAT C CT CAAAGAT GTCTCCTT GT AC AT C GAGAGT GGC CAGAT TAT GT GC AT C T 1273 

I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 22 018 ACAGGAAAAT C CT CAAAGAT GT CT C CTT GT ACAT C GAGAGT GGCCAGAC CAT GT GCAT CT 22077 

Qy 1274 TAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCT 1333 

I I II I I I I I I I I I I I I I I I I I : III I I I I I II I I II I I II I I I I I I 

Db 22078 TAGGTAGCTCAGGTAAGCGCCT C GAGGGGT C CT GCACT T GTAAGGCAGACT CT 22130 

Qy 1334 GGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTAAGTTGTAGAGAGG 1393 

I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I : I I I I I II I I II I 

Db 22131 GGGAGGCTGGGGCTCGGTCTAAGCTCGGTGTTTAAGAAATGAGTTTAATTGGGAGGGGAA 22190 

Qy 1394 CAGCCAT 1400 

II I I I I 

Db 22191 CACCCAT 22197 



RESULT 6 

AF404108/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



AF404108 567 bp DNA linear ROD 14-AUG-2001 

Mus musculus sterolin 1 (Abcg5) and sterolin 2 (Abcg8) genes, 
partial cds . 
AF404108 

AF404108.1 GI:15150321 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 567) 

Lu,K., Lee,M. and Patel,S.B. 

Molecular cloning, genomic structure and characterization of novel 

murine ABC genes Abcg5 and Abcg8 

Unpublished 

2 (bases 1 to 567) 

Lu,K., Lee,M. and Patel,S.B. 
Direct Submission 

Submitted ( 11- JUL-2001 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403 

Location/ Qualifiers 

1. .567 

/organism="Mus musculus 11 
/mol_type="genomic DNA" 
/strain="C57BL/6" 



gene 
mRNA 

CDS 



misc_f eature 

gene 
mRNA 

CDS 



/db_xref="taxon: 10090" 
complement (<1 . . >14 6) 
/gene="Abcg5 n 
complement (<1 . . >146) 
/gene="Abcg5" 
/product="sterolin 1" 
complement (<1 . .14 6) 
/gene="Abcg5" 
/codon_start=l 
/product="sterolin 1" 
/protein_id="AAK85390 . 1" 
/db_xref="GI: 15150322" 

/translation="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 
SYSV" 

complement (<1 . . >14 6) 
/gene="Abcg5" 
/number— 1 
147. .504 

/note="contains 5'UTR and promoter regions for ABCG5 and 

ABCG8 " 

<505. .>567 

/gene="Abcg8" 

<505. ,>567 

/gene="Abcg8" 

/product="sterolin 2" 

505. .>567 

/gene="Abcg8" 

/ codon_start=l 

/product="sterolin 2" 

/protein_id="AAK85391 .1" 

/db_xref="GI : 15150323" 

/ trans la tion= n MAEKTKEETQLWNGTVLQDAS " 

<505. .>567 

/gene="Abcg8" 

/ number=l 



ORIGIN 



Query Match 35.4%; 
Best Local Similarity 99.6%; 
Matches 566 ; Conservative 



Score 555.6; DB 10; Length 567; 
Pred. No. 2.1e-161; 
1; Mismatches 0; Indels 1; 



Gaps 



l; 



Qy 

Db 



1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 
I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I 
567 CGT^AGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 5 08 



QY 
Db 

Qy 

Db 

Qy 

Db 



61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 12 0 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I 
507 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 44 8 

121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 18 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I II 
44 7 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 38 8 

181 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 240 

I I I : I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M 
387 CCTCTCTGGCAAACAC-TCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 32 9 



Qy 241 ACAC T CT GGCTAAAG GT AC AT C AGAT AAT G G CAT C GT T GGC CAAAT T G GT GAAC T GT TAT 300 

I i I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I 
Db 32 8 ACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTAT 269 

Qy 301 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 360 

I I I I I I I I M II I II I I I I I I II I I I I I I I I I I I I I I II I I I II I I I I I I I M I II I I I I 
Db 268 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 2 09 

Qy 361 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 420 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 208 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 14 9 

Qy 421 C CAT G GGT GAGC T G C C C T T T CT GAGT C C AGAG GG AGC C AGAG GGC CT CACAT CAAC AGAG 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 14 8 C CAT G G GT GAG CTGCCCTTTCT GAGT C C AGAGG GAG C C AGAGGG C CT CACAT CAAC AGAG 8 9 

Qy 4 81 GGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAG 54 0 

II I I I I I II I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II 

Db 8 8 GGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAG 2 9 

Qy 541 GTGTCCTGCATGTGTCCTACAGCGTCAG 568 

I I I I I I I II II I I I I I I I I I I I I I I I I I 
Db 2 8 GTGTCCTGCATGTGTCCTACAGCGTCAG 1 



RESULT 7 

AF404109/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



gene 



AF404109 588 bp DNA linear ROD 14-AUG-2001 

Rattus norvegicus sterolin 1 (Abcg5) and sterolin 2 (Abcg8) genes, 
partial cds . 
AF404109 

AF4 04109. 1 GI: 15150324 

Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 588) 

Lu,K., Lee,M. and Patel,S.B. 

Molecular cloning, genomic structure and characterization of novel 

murine ABC genes AbcgS and Abcg8 

Unpublished 

2 (bases 1 to 588) 

Lu,K., Lee,M. and Patel,S.B. 
Direct Submission 

Submitted ( 11- JUL-2001 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403 

Location/Qualifiers 

1. .588 

/organism-"Rattus norvegicus" 
/mol_type= n genomic DNA" 
/strain="Sprague-Dawley" 
/db_xref="taxon: 10116" 
complement (<1 . . >14 6) 
/gene="Abcg5" 



mRNA 



CDS 



misc_f eature 

gene 
mRNA 

CDS 



complement (<1 . . >146) 
/gene="Abcg5" 
/product="sterolin 1" 
complement (<1 . .146) 
/gene="Abcg5" 
/ codon_start=l 
/product="sterolin 1" 
/protein_id="AAK85392 . 1" 
/db_xref="GI : 15150325" 

/ translation— "MGELPFLSPEGARGPHNNRGSQSSLEEGSVTGSEARHSLGVLNV 

SFSV" 

<1. .>146 

/ n umber =1 

147. .525 

/note="contains 5'UTR and promoter regions for ABCG5 and 

ABCG8" 

<526. .>588 

/gene="Abcg8" 

<526. .>588 

/gene="Abcg8" 

/product="sterolin 2" 

526. .>588 

/gene="Abcg8" 

/codon_start=l 

/product="sterolin 2" 

/protein_id="AAK85393 . 1" 

/db_xref="GI : 15150326" 

/translation="MAQTTKEETQLWNGTVLQDAS " 

<526. ,>588 

/gene="Abcg8" 

/ number=l 



ORIGIN 



Query Match 2 6.3%; 

Best Local Similarity 86.4%; 
Matches 508; Conservative 



Score 412.4; DB 10; 
Pred. No. l.le-116; 
1; Mismatches 57; 



Length 588; 
Indels 22; 



Gaps 



4; 



Qy 



Db 



587 



G7VAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCC 61 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I II III I I I I 
GAAGCATCCTGGAGTACAGTCCCGTTCCACAGCTGGGTCTCCTCTTTGGTCGTCTGAGCC 528 



Qy 



Db 



62 ATGACC AGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTT 106 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 

527 ATGACCTGCGGTGTTGTGCCCTTTGTGTGGCTCCTGAGGCCTCCCCTGCTGTTGGCTAGG 4 68 



Qy 

Db 

Qy 

Db 

Qy 

Db 



107 GGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTG 160 

II III I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
467 CCAGGATTCTTTCTGTCTTTGCTCCTTAGAGCTAGGGCACTTGAGTCCTCCTTCCTGGCA 408 

161 CCAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTC 220 

II I I II I I I I I I I I I I II I I I I I : I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
407 CCAGCCTTTCTCCCAGCATTCCTCTCTGGCAAGC-CCTCCTATAAACACACTGTGTGTTC 34 9 

221 TGCCTATTGTCGAGATAAGGACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGG 2 80 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I II I I M I I I 
34 8 TGCCTATTGTCGAGATAAGGACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGG 2 89 



Qy 281 C CAAAT T GGT GAAC T GT TAT CT CAC GAG GAT T C C AG G GC T G G GT AG GAT C GGAC AGGGC A 340 

I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I 
Db 2 88 C CAAAT CGGTGAACTGTTGTCT CAC GAGGACTCTCGGGCTGGATAGGATCTGACAGGGCA 229 

Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 4 00 

I I I I I I I I I I I I I I I I I 1 II I I I I I II Ml I I I I I I I I I I III III MINI 
Db 228 CT C C CAT TGGCTCCT C AGT T AAAGT T GC T C T GAAG C CAGAC AGGACAC C AGAGGAT T CAC 169 

Qy 401 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 460 

I I I I II I I I I I I III I t M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 168 TCACATTTGCTTCCCGCTGGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 109 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 

I I I I I I II I II 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II Ml I 
Db 108 AGGGCCTCACAACAACAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTC 4 9 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 5 68 

M I I II I II I II I I I II I I II I M M I I I M M I II I II I I II I II 
Db 4 8 AGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAG 1 



RESULT 8 

F351786S02 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



exon 



F351786S02 463 bp DNA linear ROD 23-AUG-2002 

Mus musculus sterolin-1 (Abcg5) gene, exon 2. 

AF351787 

AF3517 87. 1 GI: 18 95838 6 
2 of 13 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 463) 

Lu,K., Lee,M. -H . , Yu,H., Zhou,Y., Sandell, S .A. , Salen,G. and 
Patel,S.B. 

Molecular cloning, genomic organization, genetic variations, and 

characterization of murine sterolin genes Abcg5 and Abcg8 

J. Lipid Res. 43 (4), 565-578 (2002) 

21904563 

11907139 

2 (bases 1 to 463) 

Lu,K., Zhou,Y., Lee, M. -H . and Patel,S.B. 

Direct Submission 

Submitted ( 21-FEB-2001 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 

Location/ Qualifiers 

1. .463 

/organism="Mus musculus" 
/mol_type="genomic DNA" 
/strain="129/Sv" 
/db_xref="taxon: 10090" 
/ chromosome="17 " 

/map="between Mit41 and Mitl89" 

/clone="329Bll" 

101. .222 



ORIGIN 



/gene="Abcg5" 
/ number=2 



Query Match 25.6%; Score 402.6; DB 10; Length 463; 

Best Local Similarity 98.3%; Pred. No. 1.2e-113; 

Matches 402; Conservative 4; Mismatches 3; Indels 0; Gaps 0; 

Qy 1064 CCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGG 1123 

I I 1 I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I 1 I I I I I I I I I I M I M I I I i I I I I I I 
Db 1 CCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGG 60 

Qy 1124 ACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGT 1183 

I I I I I 1 II I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 61 ACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGT 12 0 

Qy 1184 G GAAC AT CAAAT CAT G C CAG CAGAAGT GG G AC AG G C AAAT C C T C AAAGAT GT CTCCTTGT 1243 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 121 G GAAC AT CAAAT CAT G C C AGCAGAAGT GG G AC AG GCAAAT C C T C AAAGAT GT CTCCTTGT 18 0 

Qy 1244 ACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSC 1303 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I M I I I I I I I I I I I I I I : I 

Db 181 ACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGGC 24 0 

Qy 1304 SGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATG 1363 

: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I I I I I I I I I I I 
Db 241 CGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCTGTCTAAGCACAATG 300 

Qy 1364 T T T AAGAAGT RAGT T T AAGT T GT AGAGAGGC AGC CAT G CAT T T GGC AT T T GAAT ACAAT C 1423 

I I I I I I I I I : I I I I I I I I I I I II I I I I I I I I II I I I I I I I I II II I I I I I I I I I I I I I 
Db 301 CT T AAGAAGT AAGT T T AAGT T GT AGAGAG GC AGC CAT G CAT T AGGC AT T T GAAT ACAAT C 360 

Qy 1424 TGGTGACTTGTCTGGCTGCCAATAGAACCTAGTACCAAAGTGAAATCTT 1472 

II I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II 

Db 361 T G GT GACT T GT C T G GCT GC CAAT AGAAC CT AGT AC CAAAGT GAAAT AT T 4 09 



RESULT 9 

F351799S01A 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 



F351799S01 1314 bp DNA linear ROD 23-AUG-2002 

Mus musculus sterolin 2 (Abcg8) gene, exon 1. 

AF351799 

AF351799. 1 GI : 18 9964 37 
1 of 13 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chorclata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 1314) 

Lu, K. , Lee,M. -H . , Yu , H . , Zhou,Y., Sandell , S . A. , Salen,G. and 
Patel, S . B. 

Molecular cloning, genomic organization, genetic variations, and 

characterization of murine sterolin genes AbcgS and Abcg8 

J. Lipid Res. 43 (4), 565-578 (2002) 

21904563 

11907139 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



2 (bases 1 to 1314) 

Lu,K., Zhou,Y., Lee, M. -H . and Patel,S.B. 

Direct Submission 

Submitted (21-FEB-2001) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St., STB 541, Charleston, SC 29403, USA 

Location/Qualif iers 

1. .1314 

/organism="Mus musculus" 
/mol_type=" genomic DNA" 
/strain="129/Sv" 
/db_xref= n taxon: 10090" 
/ chromosome^" 17" 

/map="between Mit41 and Mitl89" 

/clone="329Bll" 

<359. .421 

/gene="Abcg8" 

/ number=l 



ORIGIN 



Query Match 25.4%; 
Best Local Similarity 97.9%; 
Matches 413; Conservative 



Score 398.4; DB 10; 
Pred. No. 2.6e-112; 
1; Mismatches 7; 



Length 1314; 
Indels 1; Gaps 



l; 



Qy 

Db 



421 



CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I i I I I I I I I I I I I I I I I 
CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 362 



QY 
Db 



61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 12 0 

I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I 
361 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 3 02 



Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 18 0 

III I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

301 TTTTGCTCCTTGAGCTGGGGCACATGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 242 

181 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 240 
I I I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I 

241 CCTCTCTGGCAAACAC-TCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 183 

241 AC AC T CT G GC T AAAGGT AC AT C AG ATAAT G G CAT CGT T G GC C AAAT T GGT GAACT GT TAT 300 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

182 AC ACT CT G G CTAAAG GT AC AT C AGAT AAT G G CAT C GT T G GC CAAAT T G GT GAACT GT TAT 12 3 

301 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 360 

M I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I 

122 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 63 

3 61 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 42 0 

I I I II I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
62 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 3 

421 CC 422 
I I 

2 CC 1 



RESULT 10 

AX685738 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



AX685738 359 bp DNA 

Sequence 10 from Patent WO02081691. 
AX685738 

AX685738.1 GI : 29371747 



linear PAT 29-MAR-2003 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Hobbs,H.H., Shan,B., Barnes, R. and Tian,H. 
Abcg5 and abcg8 : compositions and methods of use 
Patent: WO 02081691-A 10 17-OCT-2002; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 
(US) 

Location/ Qualifiers 
1. .359 

/organism="Homo sapiens" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 9606" 

/note="sequence between ABCG5 and ABCG8 containing control 
sequences (bidirectional promoter) " 



ORIGIN 



Query Match 22.8%; Score 358.6; DB 6; Length 359; 

Best Local Similarity 100.0%; Pred. No. 6.6e-100; 

Matches 359; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



QY 
Db 



64 GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTT 12 3 

I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I 1 I I I I I I I I 
1 GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTT 60 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



124 



61 



184 



121 



244 



181 



304 



241 



364 



301 



GCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT 183 
I I II M I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I t I I I I I I II 
GCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT 12 0 

YT CT G GCAAAC AC T T C CT ATAAAC AC AC C GT GT GT T C T GC CT AT T GT C GAGAT AAGGAC A 243 

I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

YT CT G G CAAAC AC T T C CT ATAAAC AC AC C GT GT GT T CT GC CT AT T GT C GAGAT AAGGAC A 180 

CTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTC 303 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I II I I I I I I I I I I I I I I I I I I I I I II 
CTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTC 24 0 

ACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAA 363 

I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
ACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAA 300 

GCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCC 422 

II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II 
GCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCC 359 



RESULT 11 
AC146466/c 



LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

COMMENT 



AC146466 185045 bp DNA linear HTG 15-AUG-2003 

Callithrix jacchus clone CH259-274K20, WORKING DRAFT SEQUENCE, 3 
ordered pieces. 
AC146466 

AC14 6466.1 GI: 33667132 

HTG; HTGS_PHASE2; HTGS_DRAFT. 

Callithrix jacchus (white-tufted-ear marmoset) 
Callithrix jacchus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Platyrrhini; Callitrichidae; 
Callithrix . 

1 (bases 1 to 185045) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng, Z . , Malinov, I . and Rubin, E.M. 
Direct Submission 
Unpublished 

2 (bases 1 to 185045) 

Cheng, J.-F., Hamilton, M. , Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I . and Rubin, E.M. 
Direct Submission 

Submitted ( 15-AUG-2003 ) Genome Sciences, Lawrence Berkeley National 
Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

Sequence Produced by Berkeley PGA 
Web site: http://pga.lbl.gov 
Center Code: PGABERK 
Center Project Name: J027 
Bac Clone Name: CH259-274K20 



This sequence has been compared to sequences of other species 
using Vista (http://www-gsd.lbl.gov/VISTA) . The results can be 
viewed at: 

http : / / pga . lbl . gov/ cgi-bin/search_cvcgd?type=n&value=ABCG5 

The order-orientation of the draft sequence was accomplished by 
using : 

Avid (http://baboon.math.berkeley.edu/mavid) , 

Lagan (http://lagan.stanford.edu/) and paired end information. 

Funding agent: Programs for Genomic Applications (NHLBI) 

Summary Statistics : 
Sequencing vector: Plasmid; pUCl8 
Chemistry: Dye-terminator Big Dye 
Assembly program: Phrap version 0.990329. 

* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 3 contigs . Gaps between the contigs 

* are represented as runs of N. The order of the pieces 

* is believed to be correct as given, however the sizes 

* of the gaps between them are based on estimates that have 

* provided by the submittor. 

* This sequence will be replaced 

* by the finished sequence as soon as it is available and 

* the accession number will be preserved. 

+ 1 49109: contig of 49109 bp in length 

* 49110 49209: gap of unknown length 

* 49210 57420: contig of 8211 bp in length 



* 57421 57520: gap of unknown length 

* 57521 185045: contig of 127525 bp in length. 
FEATURES Location/Qualifiers 

source 1. .185045 

/organism="Callithrix jacchus" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 9483" 
/clone="CH259-274K20" 

ORIGIN 



Query Match 19.1%; Score 299.4; DB 2; Length 185045; 

Best Local Similarity 57.7%; Preci. No. 2.6e-81; 

Matches 824; Conservative 2; Mismatches 513; Indels 88; Gaps 13; 

Qy 12 0 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

Db 116200 CTCTGTTTCCTGGAGCAGGGACACCTCAGCCTCCTGCCCTGGGCCCGGCTCTCCCAGCAT 

116141 



Qy 18 0 TCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 239 

I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 11614 0 T C CT CT CT GGCAAGC CCA- CCT ACAAACACAT - GT GT GTT CT GC C CT CT CT CAAGATAAG 

116083 



Qy 24 0 GACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTA 299 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 116082 GACGCGCTGGCTAAAGGTACATCAGATAACGGCCTTCTTGGCCAAGTCCCAGTCCTGCCA 

116023 



Qy 300 TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 359 

II II I I I I I I I I I I I I I I I I I I I I II I I I 

Db 116022 TCCTGAGGGACTCTGGGGTCAGGTGGAGCTGGCAGGGCAGTCTGCCACTGGCTCCCCAAC 

115963 



Qy 360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

I III I I I I I I I I II I I I I I II II I I I I I I I I I I I I I 

Db 115962 TGCAGCCACTCTGAGGAGGGTCAGGCTACCAGAAAATCTGCCCAGCTTTGCTGCCCGTTG 

115903 



Qy 42 0 G C CAT GG GT GAGC T G C C C T T T CT GAGT C C AGAG GGAGC C AGAGGG C CT C AC AT CAAC AGA 479 

I I I I I I I I I I I M I I I I I I I III II II I III I II I I I I I I I 
Db 115902 GCCATGGGTGACCTTCCATCTTTGACCCCCGGAGGGTCCATAGGACTCCAGGTAAACAGA 

115843 



Qy 480 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 115842 GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACTGCACCTGAGCCT CACAGTCTG 

115786 



Qy 54 0 GGT GT CCT GCAT GT GT C CT AC AGCGT CAGGT AAGG GGACCTCCA 583 

II I I I I I I I I I I I I I I M I I I I I I II II I I I I I I I 

Db 1157 8 5 GGCATCCTCCATGCCTCCTACAGCATCAGGTAAGGCAGAGACCTTGCTGCTGCTCCTCCC 

115726 



Qy 584 CAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTT 643 

Mil II I I II III III I I I I I I II 



Db 115725 CAGGAGCACGGGGCCCTCTGCTGCCTTTTTTCACTCTTGAGCTGCCTGGCTGGAGACTTT 

115666 



Qy 644 T GGGTT GT CT GT CCAGCAGAT CAGGGT GAAAGT GGACAGT CT GTAACAACAGT GAGT C GT 703 

I I I I I I I I II I I I I I I I I I I I I I I I II 

Db 115665 GGGGCTCCCTCTTCAGTGGATCAGGTGGAGAGAAGAGAGGGGGAAGGG 

115618 



Qy 704 TCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGC 7 63 

II I I I I I I I I I I I I I I I I I I I 

Db 115617 CTGCACTGGGAAATAGGGAGCAACAGTAAATGGCCCCTCCCCCTGCCCAGGGA 

115565 



Qy 764 CGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGC--CCACCACCTGT 821 

I II II III I I I I I I I I I I I II I I I I I I I II 

Db 115564 AGGGCCTAGGTATAAACAAAGTTGAGCTGTGCCCTGCCTACCCTAGTGTCTACCACTTGC 

115505 



Qy 822 CCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAAC 881 

I I I I I I I I I I I I I I III I I I I I IE I II I 

Db 115504 CCTCTGCAGATGGAGAGAATCTGGGGCCTGGGGAGCTGGGAATAAAGGAGTCTTGAATCC 

115445 



Qy 882 A-CTGCTGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTGCAGGTGTACCATCT 940 

I II I I I I I I I I I I I I I III I II I I I I I I I 

Db 1154 44 AG GT GAC GAAT GT AGG GACAACC AC CT C C C AGAC AAAT GGGC AGGACAT T T G GAGCAG C T 

115385 



Qy 941 CCCAGTCAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGC 1000 

II III III I I I I I I I I I I I I I II 

Db 115384 CCAGCACAGGCCCCC T C C C T AGGT GACAGACAGC C T CAGT C GCT ACC T G C 

115335 



Qy 1001 C AAC T T GT CAT TAT AC C T C C AAG G AC AAC AGAGT GGT AC AT AAG G CT AAAAC AG AGT T GT 1060 

I I I I I I II I I I I I I I I II II 

Db 115334 C AG G T T C T AC AG A GAAGGATGCCGAGGCTGAAACACGTTAGGAGCCTGTCTGA 

115282 



Qy 1061 CAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCC 1120 

I I I I I I II II I I I I II I I I I I I II I I I I I I 

Db 1152 8 1 AGAT AACT G G G GT G G GAC ACAG GT G GGAT C AAT GCT GGG GAC C CAGGT GT AG CCCCTTCC 

115222 



Qy 1121 AGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTT 1180 

III III I I M I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I 
Db 115221 AGGGCCCCATGCTGCCTTTGCTTTCCTGGGATTTCCTTTAAAGCCACCGTGTGGGGCCCT 

115162 



Qy 1181 GGT GGAAC AT CAAAT CAT GC C AGCAGAAGT G GGACAG G CAAAT C C T C AAAGAT GT CT C CT 124 0 

Mill I I II I I Ml I I II I I I Mill MIMI II I I I I II I I I I I II I II 
Db 115161 GGT GG G ACAT C ACAT CT T GC C AGC GACAGTGGAC CAG G C AGAT C C T CAAAGAC GT C T C CT 

115102 



Qy 1241 TGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGG 1300 

II I I I I I I I I I II I I I II I I II I I I II I I II I II M I I I I I I I I I I 

Db 115101 TGT AC GTGGAGAGTGGGCAGAT CAT GTGCATCCTAGGAAGCT CAGGT AAGCTTGGGAAGA 

115042 



Qy 1301 GSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACA 1360 

: I I I I I I I I I I I I I I I I I 

Db 115041 AG GATTTTAAAAAGGCTTTGGCTTGAGTTAAACTCC 

115006 

Qy 1361 AT GT T T AAGAAGT RAGT T T AAGT T GT AGAGAG G C AGC CAT G CAT T T G GCAT T T GAAT AC A 142 0 

I I I I I I I I III I I I I I I I I II I I I I I I I I I I I I I I | I 

Db 115005 AC C T T AAAGAA- AC AGAT AC AGT T G T AG C AAGAAAAC C AC AG G T T T GAT AT T AGAAT GAA 

114947 

Qy 1421 AT C T G GT GAC TTGTCTGGCT GC C AAT AGAAC C T AGT AC CAAAGT GAAAT CT T GAG GAAAA 1480 

I I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I II I I I I I 
Db 114946 ATCTAATGA — T GT C T GACT GT GAAT AGAAC C T GCT AC CAAT GT GAAAT C T AT AGAAAG A 

114889 

Qy 1481 TCCCTGGAAAGAGTGGAAAGTCCTGCCTAACACGTAAGTGCCTTCTT 1527 

I I I I II I I I I I I I I II I I I I I I I I I I I I I M || | M | 
Db 114888 T - C C T GGAAAGAGT ATAAAAT CC T G C CT AACAT GT AC AT GAAT T CAT 114843 



RESULT 12 

AC146787/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

COMMENT 



Euteleostomi ; 
Aotinae; Aotus . 



Hosseini , R. 



AC146787 178016 bp DNA linear HTG 03-OCT-2003 

Aotus nancymaae clone CH258-323A5, WORKING DRAFT SEQUENCE, 4 
ordered pieces. 
AC146787 

AC146787 . 1 GI : 37497135 
HTG; HTGS_PHASE2; HTGS_DRAFT. 
Aotus nancymaae (Ma f s night monkey) 
Aotus nancymaae 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
Mammalia; Eutheria; Primates; Platyrrhini; Cebidae; 

1 (bases 1 to 178016) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher j ee, S . , 
Peng,Z., Malinov, I. and Rubin, E.M. 
Direct Submission 
Unpublished 

2 (bases 1 to 178016) 
Cheng, J. -F. , Hamilton,M., Peng,Y., 
Peng,Z., Malinov, I. and Rubin, E.M. 
Direct Submission 

Submitted ( 03-OCT-2003 ) Genome Sciences, Lawrence Berkeley National 
Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

Sequence Produced by Berkeley PGA 
Web site: http://pga.lbl.gov 
Center Code: PGABERK 
Center Project Name: W010 
Bac Clone Name: CH258-323A5 



Mukher jee, S . , Hosseini, R., 



This sequence has been compared to sequences of other species 
using Vista (http://www-gsd.lbl.gov/VISTA). The results can be 
viewed at: 

http : //pga . lbl . gov/cgi-bin/search_cvcgd?type=n&value=ABCG5 



The order-orientation of the draft sequence was accomplished by 



using : 

Avid (http://baboon.math.berkeley.edu/mavid) , 

Lagan (http://lagan.stanford.edu/) and paired end information. 



Funding agent: Programs for Genomic Applications (NHLBI) 



Summary Statistics: 
Sequencing vector: Plasmid; pUC18 
Chemistry: Dye-terminator Big Dye 
Assembly program: Phrap version 0.990329. 

* NOTE: This is a 'working draft' sequence. It currently 

* consists of 4 contigs . Gaps between the contigs 

* are represented as runs of N. The order of the pieces 

* is believed to be correct as given, however the sizes 

* of the gaps between them are based on estimates that have 

* provided by the submittor. 

* This sequence will be replaced 
by the finished sequence as soon as it is available and 
the accession number will be preserved. 



FEATURES 

source 



1 32150: contig of 32150 bp in length 
32151 32250: gap of unknown length 
32251 56222: contig of 23972 bp in length 
56223 56322: gap of unknown length 
56323 173105: contig of 116783 bp in length 
173106 173205: gap of unknown length 
173206 178016: contig of 4811 bp in length. 
Location/ Qualifiers 
1. .178016 

/organism="Aotus nancymaae" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 37293" 
/clone="CH2 58-32 3A5" 



ORIGIN 



Query Match 19.0%; Score 298.8; DB 2; Length 178016; 

Best Local Similarity 56.2%; Pred. No. 4.1e-81; 

Matches 869; Conservative 2; Mismatches 584; Indels 91; Gaps 



13; 



Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I M I I II I I I I I I I I I I I I I I III I I I I I I I 

Db 90827 CAAGGCATCCTGGGGAGTGGCCCCTTTCGGCAGCCCTCTCTCCTCGGTGGCCTTCCCAGC 9076E 



Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 



61 CAT GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGC 110 

Ml I I I I I I I I I I I I I I I I I | | | | | || 

907 67 CATGGGGCCCACAGGTCTGTGCCGTTGGGCTCAGCTCTTAGACCGGGGCTGCTGCCTGTC 90708 



111 



163 



TCTCTCTGT CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCA 

I II I I I I I I I I I I I I I I I I I I I I M I III 

907 07 AGGGCCAGTGTCTTCGCTCTGTTTCCTGGAGCAGGGACACATCAGCCTCCTGTTCTGGGC 90648 



164 



90647 



224 



223 



GCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGC 
I I I I I I I I I I I I I I I I : I I I I I I I I III Ml II I I I I I I I I I I 

CCGGCTCTCCCAGCATTCCTCTCTGGCAAG T C CAC CT AAAAAC AC GT GT GTT CT GC 90592 



283 



CT AT T GT C GAGAT AAGGAC AC T C T G G C TAAAGGT AC AT C AGAT AAT G GC AT C GT T G G C C A 

I I II I I I I II I I I I I II I I I M I I I I I I I I I I I I I II I III II I I I I I I I 

90591 CCTCTCTCAAGAT AAGGAC GCGCTGGCTAAAGGTACATC AGAT AACGGCCTCCTTGGCCA 90532 



Qy 284 AATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTC 34 3 

III I I I I I I I I I I I I I I I I I I I I I II 

Db 9 0531 AGTTCCAGTCCTGCCATCCTGAGGGACTCCGGGGTCAGGTGGAGCTGGCAGGGCAGTCTG 904 72 

Qy 344 CCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTG 403 

M I I I I II I I I I III III I I I I I I II II I I I I I I I I 
Db 90471 CCACTGGCTTCCCAACTGCAGCCACTCCGAGGAGGGTCAGGCTACCAGAAAATCTGCCCA 90412 

Qy 404 CATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGG 4 63 

I I I I I I II I I I I I I I I II I I I I I I I I I I I I I II I I I I I I 
Db 90411 GCTTTGCTGCCCGTTGGCCATGGGTGACCTTCCATCTTTAACCCTCGGAGGGTCCATAGG 90352 

Qy 4 64 GC CT CACAT CAAC AGAGG GT CT CT GAG CT C C C T G GAG C AAG GTT C GGT C AC G G G C AC AGA 52 3 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I III 
Db 90351 ACTCCAGGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACTGCACCTGA 90292 

Qy 524 GGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCA 583 

III I I I I I III I I I I I I I I I I I I I I I I I II I I I I I M I I II 
Db 90291 GCCT CACAGTCTGGGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGGCAGAGCCCT 90235 

Qy 584 CAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTT 643 

I III I I I I II I I I I I I I I I I 

Db 90234 TGCTGCTGCTC CTCCCCAGGAGCACGGTTCACTCTTGAGCTGCCTGGCTGGGGACTT 90178 

Qy 644 TGGGTT GT CT GT CCAGCAGAT CAGGGT GAAAGT GGACAGT CT GTAACAACAGT GAGT C GT 703 

I I I I I I I I I I I I I I I I I I I I I II I I II 
Db 90177 TGGGCTCCCTCTTCAGTGGATCGGGTGGAGAGAAGAGAGCGGGGA 90133 

Qy 7 04 TCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGC 7 63 

II I I I I I I I I I I I I I I I I I I I I 

Db 90132 GGGCTGCACTGGGAAATGGGGAGCAACAGTAAATGGCCCCTCCCCCTGCCCAGGGA 90077 

Qy 764 CGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGAC — TCGCCCACCACCTGT 821 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 90076 AGGGTCTGGTATAAACAAAGTTGCAGCTGTGCCCTGCCTACCCCAGTGTCTACCACTTGC 90017 

Qy 822 CCT GT GT AGAT GGAGAAGGCT CGGAGAGT GGGGGT GCT GGGGGCACAAAAT GGAAT GAAC 881 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 90016 CCTCTGCAGATGGAGAGAATCTGGGGAATCGGGG-GCTGGGAATGCAAAGAGTCTTGAAT 8 9 958 

Qy 882 AC T G C T GAAG GAAT G C AG GGT T CACT T C AAGAAGAAAGCAGT GT GC AG GT GT AC CAT CT C 941 

I I I I I I I III I I I I I III I I 

Db 89957 CCAGGTGACGAA T GC AG G GAC AAC C ACT T C C C AG AC AAAT G G G C AG 89912 

Qy 942 CCAGTCAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCC 10 01 

II I I I I I I I III I I I I I I 

Db 8 9911 GACAT T C G GAG C AG C T C C AG C AC AGG C CCCCTCCC T AGGT GAC AGACAG C CT C G GT AGCT 8 9 852 

Qy 1002 AACTT GT CATT AT AC CT C CAAGGACAACAGAGT GGTACATAAGGCTAAAACAGAGT T GT C 10 61 

III I I I I I I I I I I I II II 

Db 89851 AC CT G C CAGGT T C T AC AG AGGAG GAT G C C G AGGC T GAAA.C AC GT TAG GAG CCT GT C T GAA 89792 

Qy 10 62 AACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCA 1121 

I I I II I II II I I I I I I I I I I I I I I I II I I I I I 

Db 89791 GATAACTGGGGTGGGACACAGGTGGGATCAACGCTGGGGACCTGGGTGTAGCCCCTTCCA 89732 



Qy 1122 GGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTG 1181 

|| Ml I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 89731 GGGCCCCATGCTGCCTTTGCCTTCCTGGGATTTCCTTTAAAGCCACCGTGTGGGGCCCTG 89672 

Qy 1182 GT G GAAC AT CAAAT CAT GC C AG C AGAAGT G GGAC AGGC AAAT C C T CAAAGAT GT CT C C T T 1241 

I M I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 89671 GTGGGACATCACATCTTGCTGGCGACAGTGGACCAGGCAGATCCTCAAAGACGTCTCCTT 89612 

Qy 1242 GTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGG 1301 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I II I I II I 1 I I I 

Db 89611 GTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGTAAGCTTGGGAAGAA 89552 

Qy 1302 SCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAA 1361 

: I I I I I I II I I I I I II I ! I 

Db 8 9551 G GATTTTAAAAAGGCTTTGGCTTGAGTTAAACTCCA 89516 

Qy 1362 T GT T T AAGAAGT RAGT T T AAGT T GT AGAGAG GC AGC CAT GCAT T T GGC AT T T GAAT AC AA 1421 

I I I I I I III I I I I I II I II I I I I I I I M II I I I I I II 

Db 89515 C C C T GAAGAA- AC AGAT AC AG T T G T AG C AAGAAAG C C AC AG GT T T GAT AT T AGAAT GAAA 89457 

Qy 1422 T CT G GT GACT T GTCTGGCTGC CAAT AGAAC C T AGT AC CAAAGT GAAAT CT T GAGGAAAAT 1481 

I I I III I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I II II II li 
Db 89456 TCTAATGA — T GT CT GACT GT GAAT AGAACCT GCTAC CAAT GT GAAAT CTAT AGAAAGAT 89399 

Qy 14 82 CCCTGGAAAGAGTGGAAAGTCCTGCCTAACACGTAAGTGCCTTCTT 1527 

I I I I I I I I II I I I I I I I I I I I I I I I II I I I II III I 
Db 89398 - C C T G GAAAGAGT AT AAAAT C C T GC CT AAC AT GT AC AT GAAT T CAT 89354 



RESULT 13 

AY195873 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 



JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



AY195873 2351 bp mRNA linear ROD 01-JUN-2003 

Mus musculus strain PERA/Ei ATP-binding cassette sub-family G 
member 5 (Abcg5) mRNA, complete cds . 
AY195873 

AY19587 3. 1 GI : 31322257 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 2351) 

Wittenburg, H. , Lyons, M. A., Li, R. , Churchill, G. A. , Carey, M.C. and 
Paigen, B . 

Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 
Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
Mice 

Unpublished 

2 (bases 1 to 2351) 

Lyons, M. A., Wittenburg, H . , Walsh, K. A., Carey, M.C. and Paigen, B. 
Direct Submission 

Submitted ( ll-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 

Location/Qualifiers 

1. .2351 

/organism="Mus musculus 11 
/mol_type="mRNA" 



/strain="PERA/Ei" 

/db_xref="taxon: 10090" 

/ chromosome="17" 

/map="55 cM" 

/sex="male" 

/ tissue__type="liver " 
gene 1. .2351 

/gene="Abcg5" 
CDS 139. .2097 

/gene="Abcg5" 

/note="ATP-dependent canalicular cholesterol transporter; 
white subfamily" 
/ codon_start=l 

/product="ATP-binding cassette sub-family G member 5" 
/protein_id="AA045094 .1" 
/db_xref="GI : 31322258" 

/ trans la tion="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 

SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 

RLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 

SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVm 

EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 

TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESD 

IYHKILENIERARYLKTLPTVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 

QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 

DQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALL 

APHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 

TFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN 

FLILYGFIPALVILGIVIFKVRDYLISR" 

ORIGIN 

Query Match 18.1%; Score 284.4; DB 10; Length 2351; 

Best Local Similarity 98.0%; Pred. No. l.le-76; 

3; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

ATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCC 344 
I I I I M I 1 I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | I | | | 
ATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCC 60 

CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGC 4 04 

I I I I I I I I I I M I II I II II I I I I I I I I I I I | | | | | | | | | | | | M | | | | M | | | I 

CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGC 120 

ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 4 64 
I I t I I I M I I I I 1 I I I I I I I I || | | | | | | | | | | || | | | | | | | | | | | | | | || | || | | | | | | 
ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 18 0 

CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGC7\AGGTTCGGTCACGGGCACAGAG 524 

I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | M I I I I I I I I I I I I I I I I I I I I I I II I M I 

CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAG 2 4 0 

GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I | | | 
GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 2 94 



Matches 


28 


Qy 


285 


Db 


1 


Qy 


345 


Db 


61 


Qy 


405 


Db 


121 


Qy 


465 


Db 


181 


Qy 


525 


Db 


241 



RESULT 14 
AX456524 



LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 



FEATURES 

source 



linear PAT 06-JUL-2002 



AX456524 2354 bp DNA 

Sequence 46 from Patent WO0227016. 
AX456524 

AX45 6524 .1 GI : 21715413 



synthetic construct 
synthetic construct 
artificial sequences. 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 46 04-APR-2002; 

THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Pat el, 
Shailendra B. (US) ; Dean, Michael (US) 

Location/Qualifiers 

1. .2354 

/organism=" synthetic construct" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 32630" 
/note="Primer" 



ORIGIN 



Query Match 18.1%; 
Best Local Similarity 98.0%; 

Conservative 



Matches 


28 


Qy 


285 


Db 


1 


Qy 


345 


Db 


61 


Qy 


405 


Db 


121 


Qy 


465 


Db 


181 


Qy 


525 


Db 


241 



Score 284.4; DB 6; 
Pred. No. l.le-76; 
0; Mismatches 6; 



Length 2354; 
Indels 0; 



Gaps 



AT T GGT GAACT GT TAT CT CAC GAGGATT CCAGGGCT GGGTAGGAT CGGACAGGGCACT C C 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II 
ATT GGT GAACT GTTATCTCACGAGGATT CCAGGGCT GGGTAGGAT CGGACAGGGCACTCC 

CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGC 

|| | | | | | | I I I I I I I I I I I I I II i I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M M I 
CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGC 

ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 

C C T CACAT C AAC AGAG GGT CT C T GAGCT C C C T G GAGC AAGGTT C G GT CAC G GGC AC AGAG 

I I I M I I I I I I II I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I 

CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAG 

GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 57 8 

I I I I I I I I I I I I I I I I I I I II I I I II II I I I I I I I I I I I I I I I I I I I I 
GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 2 94 



0; 



344 



60 



404 



120 



464 



180 



524 



240 



RESULT 15 

AF312713 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



AF312713 2354 bp mRNA linear 

Mus musculus sterolin (Abcg5) mRNA, complete cds . 
AF312713 

AF312713.2 GI: 14091944 



ROD 16-MAY-2001 



Mus musculus 
Mus musculus 



(house mouse) 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 
FEATURES 

source 



gene 
CDS 



Eukaryota; Metazoa; Chorclata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus . 

1 (bases 1 to 2354) 

Lee,M.H., Lu,K., Hazard, S., Yu,H., Shulenin,S., Hidaka,H., 
Kojima,H., Allikmets , R. , Sakuma^N., Pegoraro,R., Srivastava, A. K. , 
Salen # G., Dean,M. and Patel,S.B. 

Identification of a gene, ABCG5, important in the regulation of 

dietary cholesterol absorption 

Nat. Genet. 27 (1), 79-83 (2001) 

20578753 

11138003 

2 (bases 1 to 2354) 

Lu,K., Lee,M. -H. and Patel,S.B. 

Direct Submission 

Submitted ( 12-OCT-2000) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 

3 (bases 1 to 2354) 

Lu,K., Lee, M. -H . and Patel,S.B. 
Direct Submission 

Submitted ( 16-MAY-2001 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 
Sequence update by submitter 

On May 16, 2001 this sequence version replaced gi: 12382299. 
Location/Qualifiers 
1. .2354 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6" 

/db_xref="taxon: 10090" 

/tissue_type=" liver" 

1. .2354 

/gene="Abcg5" 

139. .2097 

/gene="Abcg5" 

/note="ABCG5" 

/ codon_start=l 

/product="sterolin" 

/protein_id="AAG53097 .1" 

/db_xref = "GI : 12382300" 

/trans la tion-"MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 
SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 
RLRRTGTLEGEVFWGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 
SADFYNKKVEAVMTELSLSHV7U)QMIGSYNFGGISSGERRRVSIAAQLLQDPKv>IMLD 
EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESD 
IYHKILENIERARYLKTLPTVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 
QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN 
FLILYGFIPALVILGIVI FKVRDYLISR" 



ORIGIN 



Query Match 18.1%; Score 284.4; DB 10; Length 2354; 

Best Local Similarity 98.0%; Pred. No. l.le-76; 



Matches 28 8; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

Qy 285 ATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCC 34 4 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I M I I I I I | | I M 
Db 1 AT T GGT GAACT GT TAT C T C AC GAG GAT T C C AG GGCT G GGT AGGAT C GG AC AG GG CACT C C 60 

Qy 34 5 CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGC 4 04 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | M I I I I I I I I I I I I 
Db 61 CAT T GGCT CCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGC CACT AGAAAATTCACTTGC 120 

Qy 4 05 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 464 

M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I | | | | | | | || | | | | | | | 
Db 121 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 180 

Qy 4 65 C CT CAC AT CAAC AGAG G GT CT C T GAG C T C C CT G GAGC AAGGT T C GGT CAC GG G C AC AGAG 524 

M I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | M I I I I I I I I I 
Db 181 CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAG 240 

Qy 525 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 57 8 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 2 94 



RESULT 16 

AY195872 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



AY195872 2351 bp mRNA linear ROD 01-JUN-2003 

Mus musculus strain I/LnJ ATP-binding cassette sub-family G member 
5 (Abcg5) mRNA, complete cds . 
AY195872 

AY195872. 1 GI : 31322255 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 2351) 

Wittenburg, H. , Lyons, M. A., Li,R., Churchill, G. A. , Carey, M.C. and 
Paigen, B . 

Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 
Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
Mice 

Unpublished 

2 (bases 1 to 2351) 

Lyons, M. A., Wittenburg, H . , Walsh, K. A. , Carey, M.C. and Paigen, B. 
Direct Submission 

Submitted ( ll-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 

Location/ Qualifiers 

1. .2351 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="l/LnJ" 

/db_xref="taxon: 10090" 

/ chromosome^" 17 " 

/map="55 cM" 

/ sex="male" 

/tissue type="liver" 



gene 1. .2351 

/gene="Abcg5" 
CDS 139. .2097 

/gene="Abcg5" 

/note="ATP-dependent canalicular cholesterol transporter; 

white subfamily" 

/codon_start=l 

/product="ATP-binding cassette sub-family G member 5" 
/protein_id="AAO450 93. 1" 
/db_xref="GI : 31322256" 

/ translation="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 
SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 
RLRCTGTLEGDVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 
SADFYNKKVEAVMTELSLSHVADQVIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLD 
EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLESAFKESD 
IYHKILENIERARYLKTLPTVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 
QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQFVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYHKWQMLLAYVLHALPFSIIATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGESNTTMLNHPMCAITQGVEFIEKTCPGATSRFTAN 
FLI L YGFI PALVI LGI VI FKVRDYLI SR" 

ORIGIN 



Query Match 18.0%; Score 282.8; DB 10; Length 2351; 

Best Local Similarity 97.6%; Pred. No. 3.4e-76; 

Matches 287; Conservative 0; Mismatches 7; Indels 0; Gaps 0; 



Qy 2 85 AT T G GT GAACT GT T AT CT CAC GAG GAT T C C AG G G CT GGGT AG GAT C G GAC AG GG C AC T C C 34 4 

I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 AT T G GT GAACT GT TAT CT CAC GAG GAT T C C AG G G CT GGGT AG GAT C G GAC AG GG C ACT C C 60 



Qy 345 CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGC 404 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGC 12 0 



Qy 4 05 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 4 64 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II 
Db 121 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 180 



Qy 465 CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAG 524 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 181 CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACGGAG 24 0 



Qy 52 5 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 57 8 

I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I 
Db 241 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 294 



RESULT 17 

AC084265/c 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 



AC084265 127066 bp DNA linear PRI ll-DEC-2001 

Homo sapiens chromosome 2, clone CTB-2367F13, complete sequence. 
AC084265 

AC084265.4 GI:17488659 
HTG. 

Homo sapiens (human) 



ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 



Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 127066) 

Birren,B., Linton, L., Nusbaum, C. and Lander, E. 
Homo sapiens chromosome 2, clone CTB-2367F13 
Unpublished 

2 (bases 1 to 127066) 

Birren,B., Linton, L., Nusbaum,C, Lander, E., Abraham, H . , Allen, N,, 
Anderson, S. f Barna,N., Bastien,V., Beda,F., Boguslavkiy, L. , 
Boukhgalter, B. , Brown, A., Burkett,G., Campopiano, A. , Castle, A. , 
Choepel,Y., Colangelo, M. , Collins, S., Collymore, A. , Cooke, P., 
DeArellano,K. , Dewar,K., Diaz, J. S . , Dodge, S., Ferreira,P., 
FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., Ginde,S., Goyette,M., 
Graham,L., Grand-Pierre, N . , Hagos,B., Heaford,A., Horton,L., 
Iliev, I . , Johnson, R., Jones, C, Kann,L., Karatas,A., LaRocque,K., 
Lamazares,R. , Landers, T., Lehoczky,J., Levine,R., Lieu,C, Liu, G. , 
Macdonald, P. , Marquis, N., McCarthy, M., McEwan,P., McKernan,K., 
McPheeters,R. , Meldrim, J., Meneus,L., Mihova,T., Mlenga,V., 
Morrow, J., Murphy, T., Naylor,J., Norman, C.H., O'Connor, T., 
0'Donnell,P. , 0'Neil,D., 01ivar,T.M. , Oliver, J., Peterson, K., 
Pierre, N. , Pisani,C, Pollara,V., Raymond, C, Rieback,M., Riley, R., 
Rogov, P., Rothman,D., Roy, A., Santos, R. , Schauer,S., Severy,P., 
Sougnez,C. , Spencer, B. , Stange-Thomann, N . , Sto j anovic, N . , 
Strauss, N., Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., 
Tirrell,A. , Travers,M., Trigilio,J., Vassiliev, H . , Viel,R., Vo,A. , 
Wilson, B., Wu,X., Wyman,D., Ye,W.J., Young, G., Zainoun,J., 
Zimmer,A. and Zody,M. 
Direct Submission 

Submitted ( 18-OCT-2 000 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 

3 (bases 1 to 127066) 

Birren,B., Linton, L., Nusbaum, C, Lander, E., Ali, A. , Allen, N., 
Anderson, S., Barna,N., Bastien,V., Boguslavkiy, L . , Boukhgalter , B . , 
Brown, A., Camarata,J., Campopiano, A. , Chang, J., Chazaro,B., 
Choepel,Y., Colangelo, M. , Collins, S., Collymore, A. , Cook, A. , 
Cooke, P., DeArellano,K. , Dewar,K., Diaz, J. S., Dodge, S., Faro,S., 
Ferreira,P., FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., 
Ginde,S., Gord,S., Goyette,M., Graham,L., Grand-Pierre, N . , 
Hagos,B., Heaford,A. , Horton,L., Hulme,W., Iliev, I., Johnson, R., 
Jones, C, Kamat,A., Karatas,A., Kells,C, LaRocque,K., 
Lamazares, R. , Landers, T., Lehoczky,J., Levine,R., Liu, G. , 
MacLean,C, Macdonald, P . , Major, J., Marquis, N., Matthews, C. , 
McCarthy, M. , McEwan,P-, McKernan,K., McPheeters , R. , Meldrim, J. , 
Meneus,L., Mihova,T., Mlenga,V., Murphy, T., Naylor,J., Nguyen, C, 
Norbu,C, Norman, C.H., O'Connor, T., 0 1 Donnell , P . , 0'Neil,D., 
Oliver, J., Peterson, K. , Phunkhang, P . , Pierre, N., Pollara,V. , 
Raymond, C, Retta,R., Rieback,M., Riley, R. , Rise,C, Rogov, P . , 
Roman, J., Rosetti,M., Roy, A., Santos, R. , Schauer,S., Schupback, R. , 
Seaman, S., Severy,P., Spencer, B., Stange-Thomann, N . , Sto j anovic, N . , 
Strauss, N., Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., 
Topham,K., Travers,M., Travis, N., Trigilio,J., Vassiliev, H . , 
Viel,R., Vo,A. , Wilson, B., Wu,X., Wyman,D., Ye,W.J., Young, G., 
Zainoun,J., Zembek,L., Zimmer,A. and Zody,M. 
Direct Submission 

Submitted (24-AUG-2 001 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 



REFERENCE 4 (bases 1 to 127066) 

AUTHORS Birren,B., Linton, L., Nusbaum, C . , Lander, E., Ali,A. , Allen, N., 

Anderson, S., Barna,N., Bastien,V. , Boguslavkiy, L . , Boukhgalter , B . 
Brown, A., Camarata,J., Campopiano, A. , Chang, J. , Chazaro,B. f 
Choepel,Y., Colangelo,M. , Collins, S., Collymore,A. , Cook, A. , 
Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S., Dodge, S., Faro,S., 
Ferreira,P., FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., 
Ginde,S., Gord,S., Goyette,M. , Graham, L., Grand-Pierre, N . , 
Hagos,B., Heaford,A. , Horton,L., Hulme,W. , Iliev, I . , Johnson, R. , 
Jones, C, Kamat,A., Karatas,A. , Kells,C, LaRocque,K., 
Lamazares, R. , Landers, T., Lehoczky,J., Levine,R., Liu, G . , 
MacLean,C, Macdonald, P . , Major, J., Marquis, N. , Matthews, C, 
McCarthy, M. , McEwan,P., McKernan,K., McPheeters , R. , Meldrim, J., 
Meneus,L., Mihova,T., Mlenga,V., Murphy, T., Naylor,J., Nguyen, C, 
Norbu,C, Norman, C.H., 0'Connor,T. f 0 ' Donnell , P . , 0'Neil,D., 
Oliver, J., Peterson, K., Phunkhang, P . , Pierre, N., Pollara,V., 
Raymond, C, Retta,R., Rieback,M., Riley, R. , Rise,C, Rogov,P., 
Roman, J., Rosetti,M., Roy, A., Santos, R. , Schauer,S., Schupback,R. 
Seaman, S., Severy,P., Spencer, B. , Stange-Thomann, N . , Stojanovic,N 
Strauss, N. , Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., 
Topham,K., Travers,M., Travis, N., Trigilio,J., Vassiliev, H . , 
Viel,R., Vo,A., Wilson, B. , Wu,X., Wyman,D., Ye,W.J., Young, G., 
Zainoun,J., Zembek,L., Zimmer,A. and Zody,M. 
TITLE Direct Submission 

JOURNAL Submitted ( ll-DEC-2001 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT On Dec 11, 2001 this sequence version replaced gi: 15284200. 

All repeats were identified using Repeat-Masker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : //f tp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence__submissions @genome . wi .mit . edu 

Project Information 

Center project name: L11578 
Center clone name: 2367 F 13 



FEATURES 

source 



repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 



Location/Qualifiers 
1. .127066 

/organism="Homo sapiens" 
/mol__type="genomic DNA" 
/db_xref="taxon: 9606" 
/ chromosome="2" 
/map-" 2" 

/clone="CTB-2367F13" 
/clone_lib="CITB Human BAC" 
complement ( 8 . .17 0) 
/rpt_family="MER47A" 
171. .468 

/rpt_family="AluSx" 
complement ( 4 69 . .516) 
/rpt_family= M MER47A" 
988. .1049 
/rpt_family="MIR" 
complement (12 94 . . 1448) 



/ rp t_f ami 1 y= " L 1ME4 A" 
repeat_region complement ( 2662 . .2954) 

/rpt_family="AluSx n 
repeat_region 4049. .4431 

/rpt_family="L2" 
unsure 5261. .5269 

/note="<30 qual SNGL region 
unsure 7192. .7202 

/note="<30 qual SNGL region 
repeat_region 7310. .7472 

/rpt_family="MIR" 
repeat_region 7488. .7582 

/rpt_family="MIR" 
repeat_region 7589. .7628 

/rpt_f amily=" (TTG) n M 
repeat_region complement ( 7 631 . .7781) 

/rpt_family="AluSg/x M 
repeat_region 7791. .7922 

/rpt_family="MIR" 
repeat_region complement ( 7977 . .8300) 

/rpt_family="AluJb" 
repeat_region 9044. .9343 

/rpt_family="AluSq" 
repeat__region 10315. .10344 

/ rp t_f amil y= "AT_ri ch " 
repeat__region 10355. .10681 

/rpt_family="AluJo" 
repeat_region 10683. .10993 

/rpt_family="AluSx" 
repeat_region complement ( 12221 . .12282) 

/ rp t_f ami 1 y= "MI R3 " 
repeat_region complement ( 12306 . .1244 9) 

/ rp t_f ami 1 y = "MI R " 
repeat_region complement ( 130 08 . .13189) 

/ rp t_f ami 1 y= "MER3 3 " 
repeat_region complement ( 13190 . .13471) 

/rpt_family= ,, AluJo" 
repeat_region complement ( 13472 . .13612) 

/rpt_family="MER33" 
repeat_region 13899. .13922 

/rpt_family="GC_rich" 
repeat_region complement ( 14184 . .14250) 

/rpt_family="L2" 
repeat_region 14552. .14630 

/ rp t_f ami ly = "MER5A" 
repeat_region complement ( 14 8 09 . .15100) 

/rpt_family="AluSx" 
repeat_region complement ( 15363 . .15679) 

/rpt_family="AluY" 
repeat_region complement ( 15681 . .15979) 

/rpt_f amily^'AluSx" 
repeat_region 16292. .16388 

/rpt_family="L2" 
repeat_region 16392. .16508 

/ r p t_ f ami 1 y= "MLT IF" 
repeat_region complement (16538 . . 16616) 

/rpt_family="LTR37B n 



repeat_region 16618. .16687 

/ rpt_f amily= u Alu " 
repeat_region complement ( 16988 . .17104) 

/rpt_family="L2" 
repeat_region 17540. .17895 

/rpt_family= ,, MLTlAl n 
repeat_region complement ( 17911 . .18209) 

/rpt_family="AluSq" 
repeat_region 18487. .18680 

/ rpt_f amily="LTRl 6A1 " 
repeat_region 18802. .19026 

/ rpt_f amily= "Alu Jo " 
repeat_region complement ( 19092 . .19390) 

/ rp t_f ami 1 y= "Alu Jo " 
repeat_region complement (213 69 . .21675) 

/rpt_family="AluSx" 
repeat_region complement (22474 . .22763) 

/rpt__family= H MER115" 
repeat_region complement (22843. .22942) 

/rpt_family="MER115" 
repeat_region 23239. .23311 

/rpt_family="L3" 
repeat_region complement (23968 . .24265) 

Query Match 17.7%; Score 278.6; DB 9; Length 127066; 

Best Local Similarity 55.7%; Pred. No. 8.1e-75; 

Matches 788; Conservative 2; Mismatches 561; Indels 64; Gaps 11; 

Qy 12 0 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 179 

II I I I I I I I I I I I I I II II III I I II I I I I I II I I I I II 

Db 2 0948 CTCTGTTTCCTGGAGCAGGGACACCTCGGCCTCCTGCCCTGGGCCCGTCTCTCCCAGCAT 20889 

Qy 18 0 TCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 239 

I I I I : I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 20888 T CCTT GCT GGCAAGCCCACCTACAAACGT GT GT GTT CT GC CCACT GT C AAGAT AAG 20833 

Qy 240 GACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTA 299 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 2 0832 GACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCAGTCCTGCTG 20773 

Qy 300 TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 359 

I I I I I I II I I I I I I II I I I I I I I I I I I I I I 

Db 2 0772 TCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAAC 2 0713 

Qy 360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 20712 TGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTG 20653 

Qy 420 GCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGA 47 9 

I I I I I I I I I I I I I I I I I II III II III III II I I I I I I I 
Db 20652 GCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGA 20593 

Qy 48 0 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 2 0592 GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTG 20536 



Qy 



54 0 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGC 59 9 



Db 20535 GGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGGCAGAGCCCTT GCTGCTG 20484 

Qy 600 TCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAG 659 

II II II I I I I I I I I I I I 

Db 20483 CTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCACTCTTT 2 0424 

Qy 660 CAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCT 719 

I I I I I I I I I I I I I I I I 

Db 20423 AAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGGAGAGG 20364 

Qy 720 GCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTG-CTTCTC 778 

I I I I I I I I I I I I I I I I II II III I I I I 

Db 20363 GAGAAGGGCTGTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGCCCAGGG 20304 

Qy 77 9 ACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAA 83 8 

II III I I I I I I I II I I I I I I I 

Db 20303 AAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTACGGCCTG 20244 

Qy 839 GGCTCG GAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGAAGG 8 92 

III I I I I I I I I I I I I I I III I I I I I I I I I I I I I 

Db 2 0243 C C CT C T GT GGAT G GGAAT G G G GGT AC T G C GAAT G CAAG GAGT CT T GAAAC CT GGT GAAAG 20184 

Qy 8 93 AAT GC AG G GT T C ACT T CAAGAAGAAAG C AGT GT G CAG GT GT AC CAT C T C C C AGT C AGAGA 952 

I I I I I I I I I II I I I I I II I I I I I 

Db 20183 AATGCAGGG ACAGCCACCTCGCAGCCAAACGGACAGGACATTCAGAGCAACTCC 2 0130 

Qy 953 CCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATT 1012 

I I I I I I I I I I I III III I I I I III 

Db 20129 AGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTTCTACAG 20070 

Qy 1013 ATAC CT C CAAGGACAACAGAGTGGT ACATAAGGCTAAAACAGAGTT GT CAAC CT GTC CAG 1072 

I I I I I I I I I I I I I I II I I 

Db 20069 AGGAGGGCG CAGAGAC T GAAACAC GT T AGGAGC CT GT C C GG AGAC T ACT GGG G 20017 

Qy 1073 GGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTC 1132 

II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 20016 TGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCCCCAAGC 19957 

Qy 1133 TGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCA 1192 

I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I II 
Db 19956 TGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGGACATCA 19897 

Qy 1193 AAT CAT G C CAG CAGAAGT G G GACAGGCAAAT C CT CAAAGAT GTCTCCTT GT ACAT CGAGA 1252 

III I I I I I I I I I I I I I MINI I I I I I I II I I I I I I I I I I I I I I II I I I I I 

Db 19896 CAT CT T G C C GG C AGC AGT GGACC AGGC AGAT C CT CAAAGAT GTCTCCTT GT AC GT GGAGA 19837 

Qy 1253 GTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCC 1312 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 19836 GC G GG C AG AT CAT GT GC AT C C TAG GAAGC T C AGGT AAG 19799 

Qy 1313 TGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAG 1372 

I I I I I I I I I I I I I I : I II III I 

Db 19798 CTTGGGAAGGAGGATTCTAAAAAGGATTTGGCTTGAGTTAAACTCCACATTGAAGAA 19742 

Qy 1373 T RAGT T T AAGT T GT AGAGAG G CAG C CAT GC AT T T G GC AT T T GAAT AC AAT CT GGT GACT T 1432 

I I I I I I I I I I I I II I I I I I I I I I II! I I I I I I I I III I 



Db 



19741 AC AGAT T AAGT T GT AAC AAGAAAGC C AC AG GT T T GAT AT T AG AAT GAAT T CT AT T GA — T 19684 



Qy 1433 GTCTGGCTGC CAAT AGAAC C T AGT AC CAAAGT GAAAT C T T GAGGAAAAT C C CT G GAAAGA 1492 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 19683 AT CT GACT GT GAAT GGAA- CT GCT AC CAAT GT GAAAT CT T T AGAAAGAT - C CTT GAAAGA 19626 

Qy 1493 GTGGAAAGTCCTGCCTAACACGTAAGTGCCTTCTT 1527 

II I I I I I I I I I I I I I I I II I I I II I I 
Db 19625 GTATAAAATT CT GCCTAACAT GT AC GT GAAT T CAT 19591 



RESULT 18 

AC108476/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



AC108476 139342 bp DNA linear PRI 16-APR-2002 

Homo sapiens BAC clone RP11-1413K2 0 from 2, complete sequence. 
AC108476 

AC108476.5 GI:19807988 
HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 139342) 
Sulston,J.E. and Waterston, R. 

Toward a complete human genome sequence 
Genome Res. 8 (11), 1097-1108 (1998) 
99063792 
9847074 

2 (bases 1 to 139342) 

Harkins,C, Haakenson,W. and Doebber,A. 

The sequence of Homo sapiens BAC clone RP11-1413K20 
Unpublished (2001) 

3 (bases 1 to 139342} 
Waterston, R.H. 
Direct Submission 

Submitted ( 27- JAN-2 002 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

4 (bases 1 to 139342) 
Waterston, R. H. 
Direct Submission 

Submitted (20-FEB-2002 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

5 (bases 1 to 139342) 
Waterston, R. H. 
Direct Submission 

Submitted (29-MAR-2002 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

6 (bases 1 to 139342) 
Waterston, R. 

Direct Submission 

Submitted ( 16-APR-2 002 ) Department of Genetics, Washington 
University, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA 
On Mar 29, 2002 this sequence version replaced gi: 18767626. 
Genome Center 



Center: Washington University Genome Sequencing Center 
Center code: WUGSC 

Web site: http://genome.wustl.edu/gsc 
Contact : sapiens @watson . wus tl . edu 

Summary Statistics 

Center project name: H_NH1413K20 



NOTICE: This sequence may not represent the entire insert of this 
clone. It may be shorter because we only sequence overlapping 
clone sections once, or longer because we provide a small overlap 
between neighboring data submissions. 

This sequence was finished as follows unless otherwise noted: 
all regions were double stranded, sequenced with an alternate 
chemistry, or covered by high quality data (i.e., phred quality >= 
30); an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by sequence 
from more than one subclone; and the assembly was confirmed by 
restriction digest. 

MAPPING INFORMATION: 

Mapping information for this clone was provided by Dr. John D. 
McPherson, Department of Genetics, Washington University, St. Louis 
MO. For additional information about the map position of this 
sequence, see http://genome.wustl.edu/gsc 

SOURCE INFORMATION: 

The RPCI-11 human BAC library was made from the blood of one male 
donor, as described by Osoegawa,K., Woon,P.Y., Zhao,B., Frengen,E., 
Tateno,M., Catanese, J . J . and de Jong, P. J. (1998) An improved 
approach for construction of bacterial artificial chromosome 
libraries. Genomics 51:1-8. The clone may be obtained either from 
Research Genetics, Inc. (http://www.resgen.com) or Pieter de Jong 
and coworkers at http://www.chori.org 
VECTOR: pBACe3.6 

NEIGHBORING SEQUENCE INFORMATION: 

The clone sequenced to the left is RP11-489K22, 2000 bp overlap. 
Actual end is at base position 139342 of RP11-1413K2 0 . 

The region between 132012 and 132017 is covered only by a per 
product of clone DNA. 
FEATURES Location/Qualifiers 
source 1. .139342 

/organism="Homo sapiens" 
/ mol_t ype= " genomi c DNA" 
/db_xref="taxon: 9606" 
/ chromosome="2" 
/map="2" 

/clone="RPll-1413K20" 

/clone_lib="RPCI-ll" 
misc_feature 55. .655 

/note="match to EST AA203458 (NID: gl799169) zx58b04.rl" 
misc_feature 93. .286 

/note="match to EST AV689089 (NID:gl0290952) " 
misc feature 93. .286 



/note="similar to Mus musculus EST AI597378 (NID: g4606426) 

vj29c06.yl" 
misc_f eature 93. .279 

/note="match to EST AV660973 (NID : g9881987 ) " 
misc_f eature 318. .653 

/ no te= "match to EST R00405 (NID : g750141 ) ye71e05.rl" 
misc_f eature 372. .633 

/note="similar to Homo sapiens EST T97887 (NID : g747232 ) 

ye58h05.rl M 
misc_feature 706. .708 

/note="match to EST R00405 (NID : g750141 ) ye71e05.rl" 
misc_feature 706. .707 

/note="similar to Homo sapiens EST T97887 (NID : g747232 ) 

ye58h05.rl" 
repeat_region 847. .1139 

/rpt_family="Alu" 
misc_feature 1867. .2047 

/note="match to EST T39945 (NID : g647612 ) yal3g04.rl" 
repeat_region 2234. .2616 

/rpt_family="L2" 
misc_f eature 2983. .3121 

/note="match to EST AV689089 (NID : gl02 90952 ) " 
misc_f eature 2983. .3121 

/note="similar to Mus musculus EST AI597378 (NID: g4606426) 

vj29c06.yl" 
misc_f eature 3044. .3121 

/note="match to EST T86384 (NID : g714736 ) yd77b08 . rl" 
misc_f eature 4099. .4304 

/note="match to EST T86384 (NID : g714736) yd77b08.rl" 
misc_feature 4099. .4283 

/note="match to EST AV689089 (NID : gl0290952 ) " 
misc__f eature 4401. .4618 

/note="similar to Mus musculus EST BF162656 
(NID:gll042879) " 
misc_f eature 4405. .4454 

/note= M match to EST T86384 (NID : g714736 ) yd77b08.rl" 
misc_feature 4724. .5110 

/note="similar to Homo sapiens EST AV656623 
(NID:g9877637) " 
misc_f eature 5075. .5204 

/note="similar to Mus musculus EST BF162656 
(NID:gll042879) " 
repeat_region 5495. .5657 

/ r p t_f ami 1 y = "MI R " 
repeat_region 5673. .5767 

/rpt_family="MIR" 
repeat region 5774. .5813 

~~ /rpt_f amily=" (TTG) n" 

repeat region 5816. .5958 

/ r p t_f ami 1 y= " Al u " 
repeat_region 5976. .6091 

/rpt_family="MIR" 
repeat region 6162. .6485 

/ r p t _f ami 1 y= " Al u " 
misc_feature 6351. .6373 

/note="match to EST AA228345 (NID : gl849916 ) nc39d04.sl" 
misc feature 6352. .6364 



/note="match to EST AI431309 (NID : g4302284 ) ar55b01.xl" 

6352. .6364 

/note= M match to EST AI469772 (NID : g4331862 ) tm20f 11 . xl M 

6353. .6367 

/note="match to EST AI241685 (NID : g3837082 ) qu70f06.xl M 
6568. .6707 

/note="similar to Mus musculus EST BF162656 
(NID:gll042879) " 
6649. .6707 

/note="similar to Mus musculus EST BB598373 
(NID:gl6450340) " 
7229. .7528 
/rpt_family="Alu" 
7940. .8549 

/note="similar to EST BM725726 (NID: gl9047059) " 
8169. .8305 

/note="similar to Mus musculus EST BF162656 
(NID:gll042879) " 
8169. .8301 

/note= M similar to Mus musculus EST BB598373 
(NID:gl6450340) " 
8500. .8529 
/ rp t_f ami 1 y= " AT_r i ch " 
8540. .8868 
/rpt_family="Alu" 
8870. .9180 
/rpt_family="Alu" 
10493. .10636 
/rpt_family="MIR" 
11195. .11376 
/rpt_family="MERl_type" 
11377. .11658 
/rpt_family= ,, Alu ,, 
11659. .11799 
/ rp t_f ami ly= "MERl_t ype " 
11955. .12053 

/note= n similar to Mus musculus EST BB598373 
(NID:gl6450340) " 
11994. .12053 

/note="similar to Mus musculus EST AA239884 (NID : gl863923 ) 
mx81d01.rl" 
12086. .12109 



misc_f eature 
misc_f eature 
mi sc_f eature 

misc_f eature 

repeat_region 
mis c_f eature 
mi sc_f eature 

misc__f eature 

repeat_region 
repeat__region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
misc_f eature 

misc_f eature 

repeat_region 

Query Match 17.7%; Score 278.6; DB 9; Length 139342 

Best Local Similarity 55.7%; Pred. No. 8.2e-75; 
Matches 788; Conservative 2; Mismatches 561; Indels 



64; Gaps 11; 



Qy 12 0 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

I I I I I I I I I I I I I I I I I I I III I I II I I I I I I I I I I I M 

Db 19164 CTCTGTTTCCTGGAGCAGGGACACCTCGGCCTCCTGCCCTGGGCCCGTCTCTCCCAGCAT 19105 

Qy 18 0 TCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 239 

I M I : I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 19104 TCCTTGCTGGCAAGCCCACCTACAAACGT GTGTGTTCTGCCCACTGTCAAGATAAG 19049 

Qy 240 GACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTA 299 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I II I I I M I 



Db 19048 GACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCAGTCCTGCTG 18989 

Qy 300 TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 359 

I I I I I I II I I I I I I II I Mill I II I I I II 

Db 18 988 TCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAAC 18 929 

Qy 360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

I I II I I I I I I I II I I I I I I I I II I I I I I I I I I I I I II I I I 

Db 18928 TGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTG 18869 

Qy 42 0 GC CAT G G GT GAG CT G CCCTTTCT GAGT C CAGAGGGAG C C AGAG GGC CT C ACAT CAAC AGA 479 

I I I I I I I I I I I I I I I I I I I III II I II III II I I I I I I I 
Db 18 868 G C CAT G GGT GAC CT C T CAT CT T T GAC C C C C G GAGG GT C CAT G GGT CT C CAAGTAAAC AGA 188 09 

Qy 480 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 18808 GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTG 18752 

Qy 54 0 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGC 599 

II I I I I I I I I I I I I I I I I I I II I I I I I I I I II III 

Db 18751 GGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGGCAGAGCCCTT GCTGCTG 18700 

Qy 600 TCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAG 659 

II II II I I I I I I I I I I I 

Db 18699 CTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCACTCTTT 18640 

Qy 660 CAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCT 719 

Mil I I I I I I I II I I I 

Db 18639 AAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGGAGAGG 18580 

Qy 72 0 GCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTG-CTTCTC 778 

I I II I II I I I II I I I I II II III I II I 

Db 18579 GAGAAGGGCTGTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGCCCAGGG 18520 

Qy 77 9 ACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAA 838 

II III I I I I I II I I I I I I I I I 

Db 18519 AAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTACGGCCTG 184 60 

Qy 839 GGCTCG GAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGAAGG 8 92 

III I I I II I I I I I I M I II I I I I I I I I I I II I I 

Db 18 459 CCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGGTGAAAG 184 00 

Qy 893 AAT G CAG GGT T C AC T T CAAGAAGAAAGC AGT GT GC AG GT GT AC CAT C T C C C AGT CAGAGA 952 

I I I II I I II II I I I I I II I I I I I 

Db 18 399 AATGCAGGG ACAGC CAC CTCGCAGC CAAAC GGACAGGACATT CAGAGCAACT C C 1834 6 

Qy 953 CCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATT 1012 

II I I II I I I II III III I I I I III 

Db 18345 AGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTTCTACAG 18286 

Qy 1013 AT AC C T C CAAG GAC AAC AGAGT G GT AC AT AAG GCT AAAAC AGAGT T GT CAAC C T GT C CAG 1072 

I I I I I I I I I I I II I I I I I 

Db 182 85 AGGAGGGCG C AGAG ACT GAAACAC GT T AG GAG CCTGTCCG GAGACT AC T GG G G 18233 

Qy 1073 GGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTC 1132 

M II II I I I I II I I I II I I II I II II I I II I I I I I I 

Db 182 32 TGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCCCCAAGC 1817 3 



Qy 1133 TGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCA 1192 

I I I I I I I 1 I I I I I I I II I I I I I I I I I II I I I I II I I II I I I I I I I II II I 
Db 18172 TGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGGACATCA 18113 

Qy 1193 AAT CAT GC C AG C AGAAGT G G GAC AG GC AAAT C C T C AAAG AT GT CT C C T T GT AC AT C GAGA 1252 

III I I I I MM II II I I II II I I I I I I I I I I I I I I M II I II II I I I I I I I 
Db 18112 CAT C T T GC C GGC AGC AGT GGACC AG G C AGAT C CT CAAAGAT GTCTCCTT GT AC GT GGAG A 18053 

Qy 1253 GTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCC 1312 

I II I I I I I II I I II II I I II I I II I I I II II I I 

Db 18052 GC G GGC AGAT CAT GT GCAT CCT AGGAAGCT CAGGTAAG 18015 

Qy 1313 TGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAG 1372 

I II II I II I I I I I I Mil I I I I 

Db 18014 CTTGGGAAGGAGGATTCTAAAAAGGATTTGGCTTGAGTTAAACTCCACATTGAAGAA 17958 

Qy 1373 T RAGT T T AAGT T GT AGAGAGGC AG C CAT G CAT T T G G CAT T T GAAT AC AAT CT G GT GACT T 1432 

I I I I I I I II I I I II I II I I I I II I I I I I I I I I II Ml I 

Db 17 957 AC AG AT T AAGT T GT AAC AAG AAAG C C AC AG GT T T GAT AT TAG AAT GAAT T C TAT T GA — T 17 900 

Qy 1433 GTCTGGCTGC CAAT AGAAC C T AGT AC CAAAGT GAAAT C T T GAGGAAAAT C C CT GGAAAGA 1492 

I I I I I II I II I II II I I I I M I I M II I I I I II I I I I I I I I II I I I 
Db 1789 9 AT CT GACT GT GAAT GGAA- CT GCTACCAAT GT GAAAT CTT TAGAAAGAT - C CT T GAAAGA 17 842 

Qy 1493 GTGGAAAGTCCTGCCTAACACGTAAGTGCCTTCTT 1527 

II I I I I II I II I I II I I I I I I I II I I 

Db 17841 GT ATAAAAT T CT GC CT AAC AT GT AC GT GAAT T CAT 17807 



RESULT 19 

AC145533 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

COMMENT 



AC145533 159346 bp DNA linear HTG 19-JUL-2003 

Lemur catta clone LB2-138H20, WORKING DRAFT SEQUENCE, 5 unordered 
pieces . 
AC145533 

AC145533. 1 GI : 32 996774 
HTG; HTGS_PHASE1; HTGS_DRAFT . 
Lemur catta (ring-tailed lemur) 
Lemur catta 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Strepsirhini ; Lemuridae; Lemur. 

1 (bases 1 to 159346) 

Cheng, J.-F., Hamilton, M. , Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I. and Rubin, E.M. 
Direct Submission 
Unpublished 

2 (bases 1 to 159346) 

Cheng, J. -F., Hamilton, M. , Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I. and Rubin, E.M. 
Direct Submission 

Submitted ( 19- JUL-2 003 ) Genome Sciences, Lawrence Berkeley National 

Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

Draft Sequence Produced by Berkeley PGA 

Web site: http://pga.lbl.gov 

Center Code: PGABERK 

Center Project Name: L105-138H20 



Bac Clone Name: LB2-138H20 



FEATURES 

source 



It currently 
the pieces 



Additional information on comparative analysis and ordering are 
available at: 

http : //pga . lbl . gov/cgi-bin/search_cvcgd?type=n&value=ABCG5 

Funding agent: Programs for Genomic Applications (NHLBI) 

if library name is LB1 to LB4, please see website 

for a description: http://www-gsd.lbl.gov/cheng/BAC.html 

These libraries are available through the BACPAC Resources Center: 

http://www.chori.org/bacpac/libraryres.htm as LBNL-1 to LBNL-4 . 

Summary Statistics: 

Sequencing vector: Plasmid; pUC18 

Chemistry: Dye-terminator Big Dye 

Assembly program: Phrap version 0.990329. 

* NOTE: This is a 'working draft 1 sequence. 

* consists of 5 contigs . The true order of 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 

* be preserved. 

contig of 16021 bp in length 
gap of unknown length 
contig of 24024 bp in length 
gap of unknown length 
contig of 37292 bp in length 
gap of unknown length 
contig of 37174 bp in length 
gap of unknown length 
contig of 44435 bp in length. 



1 

16022 
16122 
40146 
40246 
77538 
77638 
114812 
114912 



16021: 
16121: 
40145: 
40245: 
77537: 
77637: 
114811: 
114911: 
159346: 
Location/ Qualifiers 
1. .159346 

/organism= "Lemur catta" 
/mol_type= " genomi c DNA" 
/db_xref="taxon: 9447" 
/clone="LB2-138H20" 



ORIGIN 



Query Match 17 . 5%; 

Best Local Similarity 58.2%; 
Matches 781; Conservative 



Score 275.4; DB 2; 
Pred. No. 8.2e-74; 
2; Mismatches 458; 



Length 159346; 
Indels 102; Gaps 



13; 



Qy 2 30 T C GAGAT AAGGACACT CT G G C T AAAG GT AC AT C AGAT AAT GGCAT C GT T GG C CAAAT T GG 289 

I I I I I I I II I II II I I I II I I I I I I I I I II I I I I I II I I I I I I I I I I I I 
Db 89750 T C GAGAT AAGAAC TGTCCGGC T AAAGGT T CAT C AGAT AAT G G CAT CT GT GG C CAAAC C C C 89809 



Qy 

Db 

Qy 

Db 

Qy 



2 90 TGAACTGTT ATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGG 338 

II III I I I I II II I III I I I I I I I I I I I 

8 9 810 TGCTCTGCCTC CAGAAC G G CAT C AC GAG GG ACT C C AG G C C C AGGAGAG GAGC AG G CAGG G 8 9869 

339 CA — CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAAT 396 
Mill I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

8 987 0 TCACCTGCCACCAGCTCCTCAGCTGAAGCCACTCCGGGGAGCGACAGGTGGCCAGAAAAC 89929 



397 TCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAG 
II I I I I I I I I I I I I I I I I I I I I I II I I I I II II III 



456 



Db 



89930 TCACCCGTCTCTGCTGCCTGCTGGCCATGAGCGACCTCCTATCGTTGGCGCCCAGGGGGT 89989 



Qy 457 CCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGG 516 

I I III I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 8 9990 CCTTGGGCCTCCCCGTGAGCAGAGGCCCCCGGAGCTCTCTGGAGGAGGTTCCTGCCGCCG 90049 

Qy 517 GCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAG 573 

I I I I I I I I I I I I I I II II I I I 1 I I I I I I I I I I I I I II II I I 
Db 9 0050 CTTCGGAGCCGCGGCACTGCCTGGGCATCTCCCATGCCTCCTACAGCGTCAGGTAAGCCA 90109 

Qy 574 GGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGGG 628 

I I I I I I I I I I I I I I I II II I I I I 

Db 90110 CAGCCTCCCCCTGCCGGCCTGAGGCCTGGGCTCTCACCCCCTCTGCTCACTCATGGGGGC 90169 

Qy 62 9 TGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATC AGGGTGAAAGT 676 

I I I I I I I II III I I I I I I I I II I ME 

Db 90170 CCTGCCTGGGGCTTTCAGGCTCCCTCTTCAGTGGCCCTCAAGGGGGGAAGAGGAGGAAGC 90229 

Qy 677 GGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTG 736 

I I I I III I I I I I I I I I I I I I I I I III 

Db 90230 GCCTGGGAAATGGGGAGCAACAGTGAACGCCCCTCCTCCTGCCAGGCCAGGCCCGGCCCA 90289 

Qy 737 GACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCC 796 

I I I I I I I I I I I I I I I I I I I II I I II 

Db 90290 GGCACAAGCAATGTTGCCAACCTGCGCCCCTGACTCCTACCCTCCCCATTGCCCACCCCT 90349 

Qy 7 97 CTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGT 8 56 

I || I I I I I I I I I I I I I I I II I I I I I I I M I 

Db 90350 GTGTCTGTGG CTGCCCTGTGTGGATGCCGG-GACCAGCGGAATGGGGGT 90397 

Qy 857 GCTGGGGG C ACAAAAT G GAAT GAAC AC T GCT GAAG GAAT GC AGG GT T CACTT CAAGAAGA 916 

I I I II I II I I I I I I I III I I I I I I I I I I I I II II I 
Db 90398 GCT GGGAACATGAGGGGTAATAAAGCCTGGG GAAG CAATGC AGG GTCAACAAC 90450 

Qy 917 AAGCAGT GTGCAGGT GTACCAT CT CCCAGTCAGAGAC CCAGTAAT CAGAGCAGCT - AAT G 975 

I I I I I I I I II I I I I I I I I I I I II 

Db 90451 CTTCCGGTCAGACCGAGAGGGGACATTCAGAACAGCTGCAGG 90492 

Qy 97 6 GGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTG 1035 

I I I I I II I I I I I I I I I I I I I II III I I I I I I I 

Db 90493 CCAGGCCCTCTCCGCAGGTGATGGGCAGCCTCGCCACTGCCCGCCAGGCTCTGCAGAGGA 90552 

Qy 1036 GTACATAAGGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGG 1088 

I III I II II I I I I II I I I II I I I I I I I I I I I I I 

Db 90553 GGGCACAGAGGCTGAATCACCTTAGGAACCTGTCCAGAGACAACTGGGGTGGGTGCAGCT 90612 

Qy 1089 GTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCT 1144 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 9 0613 GGGATCAGAGCTGGGGGACGGGGTCTGGCCTGTTCCAGGCCCCCATGCTGTCTTTGCCTT 90672 

Qy 1145 TGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGC 1204 

III I I I I I I I I I I I 1 I I I II I I II I I I I I I II I I I MM III I II I I 
Db 90673 CCCGGGGTTTCCTTTAAAGCAATCGTGTCGGGCCCTGGTGGGACATTGCATCTTGCCGGA 90732 

Qy 1205 AGAAGT G GGAC AG GC AAAT C C T CAAAGAT GTCTCCTT GT AC AT C GAGAGT G GC CAGAT T A 1264 

II Mill I II I I II II I II II I II I II I II I I I M II I I II I II I I Mill I 

Db 90733 AGCAGTGGAACAGGCAGATCCTCAAAGACGTCTCCTTGTACGTT GAGAGT GGGCAGATCA 90792 



Qy 12 65 TGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTCTAAG 1324 

I II I I I I I I I I I I I II I I I I ! I I M: I I I I I I I I I I I I 

Db 90793 TGTGCATTCTAGGGAGCTCAGGTAA — GCTGGGAAGGAGTTCTCTGAGTTCTCAG 90845 

Qy 1325 GCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTAAGTT 1384 

I I I I I I I II I I I II I I I I I I I I I I : I I I I I I I I I 
Db 90846 TGAAGGGTTTGGTTTGATCTA — C AC C AC AGT GAAGAAAC AGGT T T AAGT T 90894 

Qy 1385 GTAGAGAGGCAGCCATGCATTTGGCATTTGAATACAATCTGGTGACTTGTCTGGCTGCCA 1444 

I I II II I I I I II I I I I III I I I I I I I I I I I I I I 

Db 908 95 GC T G CAAGAAGT C C GCAAGT T T GAT AT C AGAAT GAAAT T AAAT GAC AT GT CT GAC T GT G A 90954 

Qy 1445 ATAGAACCTAGTACCAAAGT GAAAT CTT GAGGAAAATCCCT — GGAAAGAGT GGAAAGT C 1502 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 90955 AT GGAAT CT G GT AT CAAT GT GAAAT CTT T AGAAAGAT CTT T AAAAAAAGAGT AT AAAAT T 91014 

Qy 1503 CTGCCTAACACGTAAGTGCCTTC 1525 

I Mill I I I I I I III 

Db 91015 C C AC CT AAT GT AT AAGT GAAT T C 91037 



RESULT 2 0 

AC146286/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

COMMENT 



AC146286 207760 bp DNA linear HTG 15-AUG-2003 

Callicebus moloch clone LB5-414K16, WORKING DRAFT SEQUENCE, 2 
ordered pieces. 
AC146286 

AC14 6286.2 GI: 33667134 
HTG ; HTGS_PHASE2 ; HTGS_DRAFT . 
Callicebus moloch (Dusky titi) 
Callicebus moloch 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Platyrrhini; Cebidae; Callicebinae; 
Callicebus . 

1 (bases 1 to 207760) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I . and Rubin, E.M. 
Direct Submission 
Unpublished 

2 (bases 1 to 207760) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I . and Rubin, E.M. 
Direct Submission 

Submitted ( 02-AUG-2003) Genome Sciences, Lawrence Berkeley National 
Laboratory, 1 Cyclotron Rd. , Berkeley, CA 94720, USA 

3 (bases 1 to 207760) 

Cheng, J.-F., Hamilton, M. , Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I . and Rubin, E.M. 
Direct Submission 

Submitted ( 15-AUG-2003) Genome Sciences, Lawrence Berkeley National 

Laboratory, 1 Cyclotron Rd. , Berkeley, CA 94720, USA 

On Aug 15, 2003 this sequence version replaced gi: 33413351. 



Sequence Produced by Berkeley PGA 
Web site: http://pga.lbl.gov 
Center Code: PGABERK 



Center Project Name: T039 
Bac Clone Name: LB5-414K16 



This sequence has been compared to sequences of other species 
using Vista (http://www-gsd.lbl.gov/VISTA). The results can be 
viewed at: 

http : //pga . lbl . gov/cgi-bin/ search_cvcgd?type=n&value=ABCG5 

The order-orientation of the draft sequence was accomplished by 
using: 

Avid (http : / /baboon .math . berkeley . edu/mavid) , 

Lagan (http://lagan.stanford.edu/) and paired end information. 

Funding agent: Programs for Genomic Applications (NHLBI) 

If the Bac Library Name is LB1 to LB4, please see website 

for the description: http://www-gsd.lbl.gov/cheng/BAC.html 

These libraries are available through the BACPAC Resources Center: 

http://www.chori.org/bacpac/libraryres.htm as LBNL-1 to LBNL-4. 

Summary Statistics: 
Sequencing vector: Plasmid; pUC18 
Chemistry: Dye-terminator Big Dye 
Assembly program: Phrap version 0 . 990329 . 

* NOTE: This is a 'working draft' sequence. It currently 

* consists of 2 contigs . Gaps between the contigs 

* are represented as runs of N. The order of the pieces 

* is believed to be correct as given, however the sizes 

* of the gaps between them are based on estimates that have 

* provided by the submittor. 

* This sequence will be replaced 

* by the finished sequence as soon as it is available and 

* the accession number will be preserved. 

* 1 74764: contig of 74764 bp in length 

* 74765 74864: gap of unknown length 

* 74865 207760: contig of 132896 bp in length. 
FEATURES Location/Qualifiers 

source 1. .207760 

/organism="Callicebus moloch" 
/ mo l_type=" genomic DNA" 
/db_xref="taxon: 9523" 
/clone="LB5-414K16" 

ORIGIN 



Query Match 16.7%; Score 261.8; DB 2; Length 207760; 

Best Local Similarity 56.2%; Pred. No. 1.5e-69; 

Matches 8 03; Conservative 2; Mismatches 554; Indels 70; Gaps 14 

Qy 12 0 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I 

Db 14 017 9 CTCTGTTTCCTGGAGCAGGGACACCTCAGCCTCCTGCCCTGGGCCCGGCTCTCCCAGCAT 

140120 



Qy 18 0 TCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 239 

I II I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 14 0119 TCCTCTCTGGCAAGCCCA-CCTACAAACACAT-GTGTGTTCTGCCCTCTCTCAAGATAAG 

140062 



Qy 24 0 GACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTA 2 99 

I I I I I I I I II I I I I I I I I I I I II I II I I II I I I I I I I I I I I III I 
Db 14 0061 GACGCGCTGGCTAAAGGTACATCAGATAA-GGCCTCCGTGGCCAAGTCCCAGTCCTGCCA 
140003 

Qy 300 TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 359 

II II II I I I I I I II I II I I I I I I I I I II 
Db 140002 TCCTGAGGGACTCCGGGGTCAGGTGGAGCCGGCAGGGCAGTCTGCCACTGGCTCGCCAAC 
139943 

Qy 360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

I III III I I I I I I I I I I I I I I I I I II I I I I I I I I I I 

Db 139942 TGCAGCCACTCCGAGGAGGGTCAGGCCGCCAGAAAATCTGCTCAGCTTTGCTGCCCGTTG 

139883 

Qy 420 GCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGA 479 

I I I I I I I I I I I I I I I I I I I I III II III II I I I I I I I I I 

Db 139882 GCCATGGGTGACCTTCCATCTTTGACCCCCGGAGGGTCCATGGG ACTCCTAAACAGA 

139826 

Qy 48 0 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 139825 GGCTCTCAGAGCTCCCTGGAGGGGGCTCCTGCCACTGCACCTGAGCCT CACAGTCTG 

139769 

Qy 54 0 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGG G G AC C T C C AC AG 58 6 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 13 97 68 GGAGTCCTGCATGCCTCCTACAGCATCAGGTAAGGCAGAGCCCTTGCTGCAGCTCCTCCC 

139709 

Qy 587 CAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGG 64 6 

III III I I I I I I I I I I III 
Db 13 97 08 CAGGAGCACGGGGCCCTGCGTTCACCCTCTGCTGCCTTTTTTCACTCCTGAGCTTCCTGG 
139649 

Qy 647 GTT GT CT GTC CAGCAGAT CAGGGT GAAAGTGGACAGT CT GTAACAACAGT GAGT C GTT CC 706 

II II I II I I I I I I I I I 

Db 13 964 8 CTGGGGACTTTGGGCTCCCTCTTCAGTGGATCAGGTGGAGAGAAGAGAGGGGGGAGGGCT 

139589 

Qy 7 07 TCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGC 766 

I II I I I I I I I I I I III I II 

Db 13958 8 GCACTGGGAAATGGGGAGCAACAGTGAATGGCCCCT CCCCCTACCCAGGGAAGGGCCT GG 

139529 

Qy 767 TTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGT 826 

I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 139528 GTATAAACAAAGTGGCAGCTGTGCCCTGCCTACCCCAGTGT CTACCGCCTGCCCTCT 

139472 

Qy 82 7 GT AGAT G GAGAAG G C T C G G AGAGT GGGGGTGCTGGGGG CAC AAAAT G GAAT GAAC AC 883 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 139471 GT G GAT G GAGAGAAT C T G G GGAAT GGGGG-GCT GG GAGT ACAAG GAGT C T T GAAAC C AG G 

139413 



Qy 



8 84 TGCTGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTGCAGGTGTACCATCTCCC 943 



II I I I I I I I II Mill III I II || I | | | | 

Db 139412 T GAC GAAT G CAG GGAC AGT C AC CT C C C AGC CAAAT G GGCAGGACAT T C GGAG GAGC T CC A 

139353 

Qy 94 4 AGTCAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAA 10 03 

III II I I I I III I I || 

Db 139352 G CACAG GCCCGCTCCC T AGGT GACAGACAG C CT C GGT C ACT AC C T G C CAG GT T CT AC AGA 

139293 

Qy 1004 C T T GT CAT TAT AC C T C CAAGGACAAC AGAGT GGT ACAT AAG GC TAAAAC AGAGT T GT CAA 1063 

I I M Ml Ml I I I I Ml 

Db 139292 GGAGGATGCCGAGGCTGAAACACATTAGAAACCTGTCTGAAGATAACTGG 

139243 

1064 CCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGG 112 3 

II I I I I I I ' M I I I I I I I I I I I I I I I I I I | | | | | | 

Db 139242 GGCGGGGGGGACACAGGTGGGATCAATGCTGGGGACCTGGGTGTAGCCCCTTCCAGG 

139186 

Qy 112 4 ACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGT 118 3 

H M I II I M I M I I I I M I II I II I I I I I I I I I I I I I I M I II I I MM 
Db 13918 5 GCCACACGCTGCCTTTGCCTTCCTGGGATTTCCTTTAAAGCCACCGTGTGGGGCCCTGGT 

139126 

Qy H8 4 G GAAC AT CAAAT CAT G C CAG C AGAAGT G GGACAG GCAAAT C C T C AAAGAT GT C T C CT T GT 12 43 

II II IN Ml I I M I M I I I I II I II II I I I II I I II II I I II I I 

Db 13 912 5 GGGAC GT CACAT C T T G C CAG C GGC AGT GGAC CAGGCAGAT C C T C AAAGAC GT CT C CT T GT 

139066 

Qy 1244 ACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSC 1303 

I I I I I I I I M I I I II I I I II II II II M I I I I I I II II I I | | : 

Db 13 9065 ACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTCAGGTAAGCTTGGGAAGAAG- 

139007 

Qy 1304 SGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATG 1363 

I I I II I I I I I I I II I I I I 

Db 139006 GATTTTAAAAAGGCTTTGGCTTGAGTTAAACTCCACC 

138970 

Qy 1364 T T T AAG AAGT RAGT T T AAGT T GT AGAGAG G CAG C CAT G CAT T T G G CAT T T GAAT AC AAT C 1423 

I I I I I I Mill II III III I || M I I I I I I 

Db 13 8969 CTGAAGAA-ACAGATACAGTTGTAGCAAGAAAGCTGCAGGTCTGATATTAGAATGAAATC 

138911 

Qy 1424 T G GT GAC TTGTCTGGCTGC C AAT AGAAC C T AGT AC CAAAGT GAAAT CT T GAG GAAAAT C C 14 8 3 

I Ml I I I I I I II I II II I I I I II I I I I I I I I M II I I I I I II II I 
Db 138 910 T AAT GA — T GT C T GAC T GT GAAT AGAAC C T GC T AC CAAT GT GAAAT C T AT AGAAAGAT - C 

138854 

Qy 1484 CTGGAAAGAGTGGAAAGTCCTGCCTAACACGTAAGTGCCTTCTTTGCTT 1532 

I M I I I I II I I I I I I I II I II I I II II I || IMM II 
Db 138853 C T G GAAAGAGT AT GAAAT C CT G C C T AAC AT GT AC AT GAAT T CAT TAT T T 138805 



RESULT 21 
AF404106 

LOCUS AF404106 



4899 bp DNA linear PRI 14-AUG-2001 



DEFINITION Homo sapiens clone BAC 32814 sterolin 2 (ABCG8) and sterolin 1 

(ABCG5) genes, partial cds . 
ACCESSION AF404106 

VERSION AF404106.1 GI: 15150315 

KEYWORDS 

SOURCE Homo sapiens (human) 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 4899) 

AUTHORS Lu,K., Lee, M. H . , Hazard, S., Brooks-Wilson, A. , Hidaka,H., Kojima,H., 
Ose,L., Stalenhoef , A. F. , Mietinnen, T . , Bj orkhem, I . , Bruckert,E., 
Pandya,A. , Brewer, H . B . Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel,S.B. 

TITLE Two genes that map to the STSL locus cause sitos terolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 
sterolin-2, encoded by ABCG5 and ABCG8, respectively 
JOURNAL Am. J. Hum. Genet. 69 (2), 278-290 (2001) 
MEDLINE 21344600 
PUBMED 11452359 
REFERENCE 2 (bases 1 to 4899) 

AUTHORS Lu,K., Lee,M. and Patel,S.B. 
TITLE Direct Submission 

JOURNAL Submitted ( ll-JUL-2001) Division of Endocrinology, Diabetes and 

Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403 
FEATURES Location/Qualifiers 
source 1. .4899 

/ organism="Homo sapiens " 

/mol_type=" genomic DNA" 

/db_xref="taxon: 9606" 

/clone="BAC 32814" 
gene complement (<1 . .>3668) 

/gene="ABCG8" 
mRNA complement (<3 60 6. .>3668) 

/gene="ABCG8" 

/product="sterolin 2" 
CDS complement (<3606. .3668) 

/gene="ABCG8" 

/codon_start=l 

/product="sterolin 2" 

/protein_id= "AAK8 5386.1" 

/db_xref="GI : 15150316" 

/ trans la tion="MAGKAAEERGLPKGATPQDTS" 
exon complement (3606. . >3668) 

/gene="ABCG8" 

/ number=l 
misc_feature 3669. .4043 

/note="contains 5'UTR and promoter regions for ABCG5 and 

ABCG8" 

gene <4044. ,>4899 

/gene="ABCG5" 
mRNA join(<4044. .4186,4770. ,>4891) 

/gene="ABCG5" 

/product="sterolin 1" 
CDS join(4044. .4186,4770. .>4891) 

/gene="ABCG5" 



/ codon_start=l 
/product="sterolin 1" 
/protein__id="AAK85387 . 1" 
/db_xref = "GI : 15150317" 

/translation= n MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 

YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSS' 1 
exon <4044. .4186 

/gene= M ABCG5" 

/number-1 
exon 4770. .4891 

/gene= n ABCG5 n 

/ number =2 

ORIGIN 

Query Match 15.6%; Score 244.8; DB 9; Length 4899; 

Best Local Similarity 55.0%; Pred. No. 2.5e-64; 

Matches 651; Conservative 1; Mismatches 488; Indels 44; Gaps 7; 

Qy 12 0 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

| | | I I 1 I I I I I I I I I II I I I II I I I M I I I I I I I I I I I I I 

Db 3744 CTCTGTTTCTTGGAGCAGGGACACCTCGGCCTCCTGCCCTGGGCCCGTCTCTCCCAGCAT 3803 

Qy 180 TCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 239 

||||: I I I I I I I I I I I I I I I I I I I I I 1 I I I M 

Db 3804 TCCTTGCTGGCAAGCCCACC TACAAACGTGTGTGTTCTTGCCCACTGTCAAGATAAG 3860 

Qy 240 GAC ACT C T GGCT AAAGGT ACAT CAGAT AAT GG C AT C GT T G G C CAAAT T GGT GAACT GT T A 299 

| I I I M I I I II I I I I I I I I I I I I M I I I I I II I I I I I I I I I M I I 

Db 3861 GACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCAGTCCTGCTG 3920 

Qy 300 TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 359 

|| | | | 1 I I II III I I I I I I I I I I II 

Db 3921 TCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAAC 398 0 

Qy 360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

| I M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

Db 3981 TGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTG 4 040 

Qy 42 0 GCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATC7VACAGA 479 

II I I I I I I I I I I I I I I I I I IM I I Ml I I I II I I M I I I 
Db 4041 GC CAT GGGTGACCTCT CAT CTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGA 4100 

Q y 48 0 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I | | I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I M I I I 

Db 4101 GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTG 4157 

Qy 54 0 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGC 599 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

D b 4158 GGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGG CAGAGCCCTTGC 4204 

Qy 600 TCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAG 659 

| M lit t I II I I I I I I I I I 

Db 4205 TGCTGCTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCAC 4264 

Qy 660 CAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCT 719 

I I I I I I I I I I I I II I I I 

Db 4265 TCTTTAAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGG 4324 



Qy 720 GCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCA 77 9 

I I I I I I I I I I I I I I III I I I I I I 

Db 4325 AGAGGGAGAAGGGCTGTTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGC 4384 

Qy 78 0 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCC CACCACCTGTCCTGTGTAGAT 832 

I III II 1 II I I II I I I I I I I 

Db 438 5 CCAGGGAAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTAC 4 444 

Qy 833 GGAGAAGGCTC GGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGC 88 6 

II III I I I II I I I I I I I I I III I I I I I III 

Db 4445 GGCCTGCCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGG 4504 

Qy 887 T GAAG GAAT G CAGGGTT C AC T T CAAGAAGAAAG CAGT GT G CAG GT GT AC CAT CT C C C AGT 94 6 

I I I I I I I I I I I II I II I I I I I II II I I 

Db 4505 T GAAAGAAT GCAGGG AC AGC CAC CT C GCAGCCAAAC G GACAG GAC AT T C AGAG C 4558 

Qy 947 CAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTT 1006 

I II I I I I I I I I I I I I I I I I I I I I I 

Db 4559 AACTCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTT 4618 

Qy 1007 GT CATT AT ACCT C CAAGGACAACAGAGT GGTACATAAGGCTAAAAC AGAGTT GT CAAC CT 1066 

I I I I I I I I M I I I II 

Db 4619 C T AC AGAGGAGG G CG C AGAGACT GAAAC AC GT T AGGAGCCT GTCCGGAGACTAC 4672 

Qy 1067 GTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACC 1126 

III II II I MINI I I I I I I I I I I I I I I I I I I I I 

Db 4 673 TGGGGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCC 4732 

Qy 1127 CTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGA 1186 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I MM II I I I I I II I I I 
Db 4733 CCAAGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGG 4792 

Qy 1187 AC AT CAAAT CAT GC C AGC AGAAGT G G GACAG GCAAAT C CT CAAAGAT GT CT C CT T GT AC A 1246 

I I I I II III I I I I I I II I I I I I I I I I I I I I II I I I I I I I M I I I II I I I I I I 
Db 4 7 93 AC AT CAC AT CT T G CC GGC AGC AGT G GAC CAGGC AGAT C CT CAAAGAT GTCTCCTT GT AC G 4 8 52 

Qy 1247 T C GAGAGT GGC C AGATT AT GT G CAT C T T AGGCAGCT CAGGT AAG 1290 

I I I I I I II II I I I I I I I II I I I I II I I II I I I I II II I 
Db 4853 T GGAGAGC G G G C AGAT CAT GT G CAT C C T AGGAAGCT CAGGT AAG 48 96 



RESULT 22 

AX456521 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 



AX456521 

Sequence 43 from Patent 
AX456521 

AX456521. 1 GI : 21715411 



5459 bp 
WO0227016. 



DNA 



linear 



PAT 0 6-JUL-2002 



synthetic construct 
synthetic construct 
artificial sequences . 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 43 04-APR-2002; 



THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Patel, 
Shailendra B. (US) ; Dean, Michael (US) 
FEATURES Location/Qualifiers 
source 1. .5459 

/organism^" synthetic construct" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 32630" 
/note="Primer" 

ORIGIN 

Query Match 15.5%; Score 242.8; DB 6; Length 5459; 

Best Local Similarity 54.9%; Pred. No. l.le-63; 

Matches 649; Conservative 1; Mismatches 4 88; Indels 44; Gaps 7; 

Qy 12 0 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

I I I I I I I I I I I I I I I I I I I I III I I II I I II I I I I I I I II 

Db 4309 CTCTGTTTCTTGGAGCAGGGACACCTCGGCCTCCTGCCCTGGGCCCGTCTCTCCCAGCAT 4368 

Qy 18 0 T CCT YT CT GGCAAAC ACT T C C T AT AAAC AC ACC GT GT GT T C T G C CT AT T GT C GAGAT AAG 23 9 

I II I : I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I 

Db 4369 TCCTTGCTGGCAAGCCCACC TACAAACGTGTGTGTTCTTGCCCACTGTCAAGATAAG 4425 

Qy 24 0 GACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTA 2 99 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I II I I I I I I II I I 

Db 4 426 GACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCAGTCCTGCTG 44 85 

Qy 300 T CT CAC GAGGAT T C C AGG G CT GG GT AGGAT C GGACAGGG C ACT C C CAT T G G CT C C T CAGT 359 

I I I I I I II I I I I I I II I I I I I I I I I I I I I I 

Db 4 486 TCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAAC 4545 

Qy 360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 4546 TGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTG 4 605 

Qy 420 G CCAT G GGT GAGC T GCCCTTTCT GAGT C CAGAGGGAGC C AGAG GGC CT CAC AT CAAC AGA 479 

I I I I I I I I I I I I I I I I I I I III II I II 1 I I II I I I I I I I 
Db 4 606 GCC AT G GGT GAC C T CT CAT CT TT GAC CC C C G GAGGGT C C AT GG GT CT C CAAGT AAAC AGA 4665 

Qy 4 80 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 

Db 4 666 GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTG 4722 

Qy 54 0 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGC 599 

II II I I I I I I I I I I I I I II I I I I I II I I I I II I II 

Db 4723 GGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGG CAGAGCCCTTGC 4769 

Qy 600 TCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAG 659 

I II II I I I I I I I I I I I I I I 

Db 4 77 0 TGCTGCTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCAC 4 82 9 

Qy 660 CAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCT 719 

I I I I I I I I I I I I II I I I 

Db 4 830 TCTTTAAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGG 4 889 

Qy 72 0 GCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCA 779 

I I I I I I I I I I II I I III I III II 

Db 4 890 AGAGG GAGAAGGG CT GT T G CT GGGAAAC AT G GAG C GACAGT GAAT GGCCCCTCCCCCTGC 4 949 



Qy 780 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCC CACCACCTGTCCTGTGTAGAT 832 

I III II I II I I I I I I I I I I I 

Db 4 950 CCAGGGAAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTAC 500 9 

Qy 833 GGAGAAGGCTC G GAGAGT GGGGGTGCTGGGGG C ACAAAAT G GAAT GAACACT G C 886 

II III I I I II I 1 I I I I I I I III I I I I I III 

Db 5010 GGCCTGCCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGG 5069 

Qy 8 87 T GAAGGAAT G C AG GGT T C ACT T CAAGAAGAAAGC AGT GT G CAGGT GT AC CAT CT C C C AGT 94 6 

I I I I I I I I I I I I I I II I I I I I II II II 

Db 5070 T GAAAGAAT GCAGGG ACAGC C AC CT C GC AG C CAAAC G G AC AG G AC ATT CAGAG C 5123 

Qy 947 CAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTT 1006 

I II I I I I I I I I I I I I I I I I I II I I 

Db 512 4 AACTCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTT 518 3 

Qy 1007 GTCATTATACCTCCAAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCT 1066 

I I I I I I I I II I I I II 

Db 518 4 C T AC AGAG GAGG G C GC AGAGAC T GAAAC AC GT T AG GAG C C T GT C C GGAGACTAC 5237 

Qy 1067 GTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACC 1126 

III II II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 5238 TGGGGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCC 52 97 

Qy 1127 CTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGA 118 6 

II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 5298 CCAAGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGG 5357 

Qy 1187 ACATCAAATCATGCCAGCAG7\AGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACA 1246 

I I I II I III I I I I MM I I I II Mill! II I I I I II I I I II M I I II II I I I 
Db 5358 ACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACG 5417 

Qy 1247 T C GAGAGT GGC CAGATT AT GT GCAT CTT AGGCAGCT CAGGT A 12 88 

I I I I I I II I I I I I II I I I I I II I I I I I I II I I I I I I 
Db 5418 T GGAGAGCG G G CAGAT CAT GT GCAT C CT AG GAAGCT CAGGT A 5459 



RESULT 23 

F351812S01/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



F351812S01 2809 bp DNA linear PRI 10-AUG-2001 

Homo sapiens sterolin-2 (ABCG8) gene, exon 1. 

AF351812 

AF351812 . 1 GI: 1514 6431 
1 of 13 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 2809) 

Lu,K., Lee,M.H., Hazard, S., Brooks-Wilson, A. , Hidaka,H., Kojima,H., 
Ose,L., Stalenhoef , A. F. , Mietinnen, T . , B jorkhem, I . , Bruckert,E., 
Pandya,A., Brewer, H.B. Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel, S.B. 

Two genes that map to the STSL locus cause sitosterolemia : genomic 
structure and spectrum of mutations involving sterolin-1 and 



JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



exon 



sterolin-2, encoded by ABCG5 and ABCG8 , respectively 

Am. J. Hum. Genet. 69 (2), 278-290 (2001) 

21344600 

11452359 

2 (bases 1 to 2809) 
Lu,K. 

Direct Submission 

Submitted (21-FEB-2001) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 

Location/ Qualifiers 

1. .2809 

/organism="Homo sapiens" 
/mol_type— "genomic DNA" 
/db_xref="taxon: 9606" 
/chromosome="2 " 

/map= "between D2S2294 and D2S2298" 

/clone="1081G2; 32814" 

/cell_type-"ES cell" 

<1227. .1289 

/gene="ABCG8" 

/number=l 



ORIGIN 



Query Match 15.4%; 
Best Local Similarity 54.8%; 
Matches 648; Conservative 



Score 241.2; DB 9; 
Pred. No. 3.3e-63; 
1; Mismatches 48 9; 



Length 2809; 
Indels 44; 



Gaps 



7; 



Qy 



Db 



120 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

I I I I I I I I I II I I I I I I I I III I I II I I I I I I I I I I I I I 

1151 CTCTGTTTCCTGGAGCAGGGACACCTCGGCCTCCTGCCCTGGGCCCGTCTCTCCCAGCAT 10 92 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



18 0 TCCTYTCT GGCAAAC ACT T C C T AT AAAC AC AC C GT GT GT T CT GC C TAT T GT C GAGAT AAG 23 9 

1111:111111111 I II I I I I I I I I I I I I I I I I I I I I 

1091 TCCTTGCTGGCAAGCCCACC T ACAAAC GT GT GT GTT CTT GCC CACT GT CAAGATAAG 1035 



240 



299 



GACAC T CT G GCT AAAGGT AC AT C AGAT AAT GG C AT C GT T G GC CAAAT T G GT GAACT GT T A 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

1034 GACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCAGTCCTGCTG 975 

300 TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 359 

I I I I I I II I I I I I I II I I I I I I I I I I I I I I 

97 4 TCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAAC 915 

360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I II I I I M I 

914 TGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTG 855 

42 0 GCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGA 47 9 

I I I I I I I I I II I I I I I I I I III II III III II I I I I I I I 
854 GCCATGGGTGACCTCT CAT CTTTGACCCCCGGAGGGTC CAT GGGTCTCCAAGT AAAC AGA 795 

4 80 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I I I I I I I I I I II I I I I II I M I III I I I III II I I I I I I I 

794 GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTG 738 



Qy 



54 0 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGC 599 



Db 737 GGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGG CAGAGCCCTTGC 691 

Qy 600 TCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAG 659 

I II I I I I I I I I I I I Mill 

Db 690 TGCTGCTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCAC 631 

Qy 660 CAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCT 719 

I I I I I I I I I I I I II I I I 

Db 630 TCTTTAAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGG 571 

Qy 720 GCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCA 779 

I I I I I I I I I I I I I I III I I I I I I 

Db 570 AGAGGGAGAAGGGCTGTTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGC 511 

Qy 780 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCC CACCACCTGTCCTGTGTAGAT 832 

I 111 II I II I I I I I I I I I I I 

Db 510 CCAGGGAAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTAC 451 

Qy 833 GGAGAAGGCTC G GAGAGT GGGGGTGCTGGGGG C ACAAAAT G GAAT GAAC AC T GC 886 

II III I I I I I I II I I I I I I III I I I I I III 

Db 450 GGCCTGCCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGG 391 

Qy 8 87 T GAAGGAAT G C AG G GT T C AC T T CAAGAAGAAAGCAGT GT GCAGGT GT AC CAT CT C CC AGT 94 6 

I I I I I I I I I I I I I I II I I I I I II II II 

Db 390 T GAAAGAAT GC AG G G ACAGC CAC CT C G CAG C CAAAC GGACAG GAC AT T C AG AG C 337 

Qy 947 C AGAG AC C CAGTAAT C AGAG C AGC T AAT GG GAG GC AT GCTCCTTGGGTGGTGGC CAACT T 1006 

I II I I I I I I I I I I I I I I I I I I I I I 

Db 336 AACTCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTT 277 

Qy 1007 GTCATTATACCTCCAAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCT 1066 

I I I II I I I I I II I II 

Db 276 CT ACAGAGGAGGGCGCAGAGACT GAAACAC GT TAGGAGC CT G T C C G GAG AC T AC 223 

Qy 1067 GTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACC 1126 

III II II I I I I I II I I I I I I I I I M I I I I I I I I I 

Db 222 TGGGGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCC 163 

Qy 1127 CTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGA 1186 

I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I 
Db 162 CCAAGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGG 103 

Qy 1187 AC AT C AAAT CAT G C CAG C AGAAGT G GGACAG GCAAAT C CT CAAAGAT GTCTCCTT GT ACA 1246 

I I I I I I III MM MM II I II I I II I I I I II II I I I I M II I II I I II I I I 

Db 102 AC AT CAC AT CTTGCCGG CAG C AGT G GAC CAG G CAGAT C C T CAAAGAT GTCTCCTT GT AC G 43 

Qy 1247 TCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTA 12 8 8 

I I I II I II I I II I II I I I I II I II I I I I I I II II I I 
Db 42 T G GAGAG C G G G CAGAT CAT GT GCAT C CT AG GAAGCT CAG GT A 1 



RESULT 24 
AC146464 

LOCUS AC146464 202533 bp DNA linear HTG 19-AUG-2003 

DEFINITION Saimiri sciureus clone CH254-84A11, WORKING DRAFT SEQUENCE. 
ACCESSION AC146464 



VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

COMMENT 



AC146464. 1 GI: 33636782 

HTG; HTGS_PHASE2; HTGS_DRAFT. 

Saimiri sciureus (common squirrel monkey) 

Saimiri sciureus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Primates; Platyrrhini; Cebiciae; Cebinae; 
Saimiri . 

1 (bases 1 to 202533) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng, Z . , Malinov, I . and Rubin, E.M. 
Direct Submission 
Unpublished 

2 (bases 1 to 202533) 

Cheng, J. -F., Hamilton, M. , Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I . and Rubin, E.M. 
Direct Submission 

Submitted ( 14-AUG-2003 ) Genome Sciences, Lawrence Berkeley National 
Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

3 (bases 1 to 202533) 

Cheng, J.-F., Hamilton, M. , Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I. and Rubin, E.M. 
Direct Submission 

Submitted ( 19-AUG-2003 ) Genome Sciences, Lawrence Berkeley National 
Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

Sequence Produced by Berkeley PGA 
Web site: http://pga.lbl.gov 
Center Code: PGABERK 
Center Project Name: S030 
Bac Clone Name: CH254-84A11 



This sequence has been compared to sequences of other species 
using Vista (http://www-gsd.lbl.gov/VISTA). The results can be 
viewed at : 

http : //pga . lbl . gov/cgi-bin/search_cvcgd? type=n&value=ABCG5 

The order-orientation of the draft sequence was accomplished by 
using : 

Avid (http://baboon.math.berkeley.edu/mavid) , 

Lagan (http://lagan.stanford.edu/) and paired end information. 

Funding agent: Programs for Genomic Applications (NHLBI) 

Summary Statistics : 
Sequencing vector: Plasmid; pUCl8 
Chemistry: Dye-terminator Big Dye 
Assembly program: Phrap version 0.990329. 

* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 1 contigs . Gaps between the contigs 

* are represented as runs of N. The order of the pieces 

* is believed to be correct as given, however the sizes 

* of the gaps between them are based on estimates that have 

* provided by the submittor. 

* This sequence will be replaced 

* by the finished sequence as soon as it is available and 

* the accession number will be preserved. 

* 1 202533: contig of 202533 bp in length. 



FEATURES Location/Qualifiers 
source 1. .202533 

/organism="Saimiri sciureus" 
/mol_type=" genomic DNA" 
/db__xref="taxon: 9521" 
/clone="CH254-84All" 

ORIGIN 

Query Match 15.2%; Score 238.8; DB 2; Length 202533; 

Best Local Similarity 55.3%; Pred. No. 2.2e-62; 

Matches 784; Conservative 2; Mismatches 564; Indels 68; Gaps 14; 

Qy 120 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 

Db 32371 CTCTGTTTCCTGGAGCAGGGACGCTTCAGCCTCCTGCCCTGGGTCTGGCTCTCCCAGCAT 32430 

Qy 18 0 TCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 2 39 

I I I I : I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 32431 TCCTCTCTGGCAAGCCCA-CCTGCAAACACA-TGTGTGTTCTGCCCTCTCTCAAGATAAG 32488 

Qy 24 0 GACACT C T G G C T AAAG GT AC AT CAGATAAT G G CAT C GTT GGC CAAAT T G GT GAAC T GTT A 2 99 

I I I I I I II I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 324 89 GACGCGCTGGCTAAAGGTACATCAGATAACGGCCTCCTTGGCCAAGTCCTAGTCCTGCCA 32548 

Qy 300 T CT C AC GAGGAT T C C AG GG C T GG GT AG GAT C GGAC AGGG C ACT C C CAT TGGCTCCT C AGT 359 

II I I I I I I III I I I I I I I I I 1 I I I I 

Db 32549 TC CTGAGGCTCAGGTGGAGCCAGCAGGGCAGTCTGCCACTGGCTCCCCAAC 32599 

Qy 360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

I III I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 

Db 32 600 TGCAGCCACTCCGAGGAGGGTCAGGCTACCAGAAAATCTGCTCAGCTTTGCTGCCCGTTG 32 659 

Qy 42 0 G C CAT G G GT GAGCT GCCCTTTCT GAGT C C AGAG G GAG CC AGAGGG C CT C AC AT CAAC AGA 47 9 

I I I I I I I I I I I I I I I I I I I I II II I I I I I I I II I I I I I I 

Db 32660 G C CAT G G GT GAC CT T C CAT C T T T GAC C C C C AGAGGGT CC AT AG G AC T C CAGG GAAAC AGA 32719 

Qy 4 80 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I I I I I II I I I I I I I I I I I II I I I I I I I I I I I Mill I 
Db 32720 GGCTCCCAGAGCTCCCTGGAGGGGGCTCTTGCCACTGCACCTGAGCCT CACAGTCTG 32776 

Qy 540 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAA7WVGCTAGGC 599 

I I II I I I I I I I I I I I I I I I I II I I I I I I I I II I I 

Db 32777 GGCGTCCTCCATGCCTCCTACAGCATCAGGTAAGGCAGAGCCCTTGCTGCTGCTCCTCCC 32 836 

Qy 600 TCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAG 659 

II II I I I I I I I I I I 

Db 32837 CAGGAGCACGGGGCCCTACGTTCGCCCTCTGCTGCCTTTTTTCACTCTTGAGCTGCCTGG 328 96 

Qy 660 CAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGT GAGT C GTT CCTCCTCCTCCTCCT 719 

I I I I I I I I I I I I II 

Db 32 897 CT GGG GAC TTTGGGCTCCCTCTT C AGT G GAT C G GAT GGAGAGAAGAGAGC AGG GAG G GC T 32956 

Qy 72 0 GCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCT — 777 

I I I I I I I III II I I I I I III II II 

Db 32 957 GCACTGGGAAATAGGTAGCAACAGTAAATGGCTCCTCCCTCTGCCCAGGGAAGGGCCTGG 33016 



Qy 



778 



CACTGATTTCTGCTCTCCCCTTCCTTGACTC — GCC C AC CAC CT GT C CT GT GT A 82 9 



Db 33017 TAATAAACAAAGTTGCAGCTGTGCCCTGCCTACCCCAGTGTCCACCGCTTGCCCTCTGCG 33076 

Qy 8 30 GATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGA 8 89 

I I I I I I I I I I I I I I I I I II I I I III I | | I I | | | | 

Db 33077 GATGGAGAGAATCTGGGGAATGGGG — CCTGGGAATGCAAGGAGTCTTGAATCCAGGTG- 33133 

Qy 8 90 AGGAAT GCAGGGTT CACTT CAAGAAGAAAGCAGT GT GCAGGT GTAC CAT CT C C CAGT CAG 949 

I I I I I I I I I I I III I I I I I II I 

Db 33134 AC GAAT GCAGG GACAAC CAC CT C C C AGACAC AT GGGCAGGAC AT T C G GAG CAGCT C C AGC 33193 

Qy 950 AGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTC 1009 

I I I I I I I I I I I I I I I I I I I I 

Db 33194 ACAGGCCCCCTTCCTAGGAGACAGACAGCCTCAGTCGCTACCTGCCAGGTTCTACAGAGG 33253 

Qy 1010 AT TAT AC C T CCAAG GACAAC AGAGT GGT ACAT AAG GCT AAAAC AGAGT T GT CAAC CT GT C 1069 

M MM I I II I I I I I I M 
Db 33254 ATGGAGGCTGAAACACAACACGTTAGGAGCCTGTCTGAAGATAACTGGGGT 33304 

Qy 107 0 CAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTA 1129 

II I I M II I MUM I II M I I I I I M I I I I I I II I 

Db 33305 GGCACACAGGTGGGATCAATGCTGGGGACCTGGGTGTAGCCCCTTCCAGGGCCCCA 33360 

Qy 1130 CTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACA 1189 

I I M M M I II I I I I I I I I I i M I I I M I I I II I I II I III I I I I II Ml 

Db 33361 TGCTGCCTTTGCCTTCCTGGGATTTCCTTTAAAGCCACCGTGTGGAGCCCTGGTGGGACA 33420 

Qy 1190 T CAAAT CAT GC C AG CAGAAGT G GGAC AG GCAAAT C CT CAAAGAT GTCTCCTT GTAC AT C G 1249 

I M M I I II I I I I I I I I I II I II I I I M I I I I I I I I I II I I I II II 

Db 33421 T CAC AT CT T GC CGGCGACAGT GGAC CAGGCAGATC CT CAAAGAC GTCT C CTT ATAT GT GG 33480 

Qy 12 50 AGAGT GGCCAGAT TAT GTGCAT CTT AGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGC 1309 

II II I I I II I II I I M I II I II I I I II I I M I I I I III : II 

Db 33481 AGAGCGGGCAGGTCATGTGCATCCTAGGAAGCTCAGGTAAG CTTGGGATGAAGGA 33535 

Qy 1310 TCCTGTACTTCTAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAG 1369 

I M I I I I I II I II I I I II I I I I I II I 

Db 33536 TTCTGAA AAGGCTTTGGCTTGAGTTAAACTCCACCCTGAAG 33576 

Qy 137 0 AAGT RAGT T TAAGT T GT AGAGAG GC AG C CAT GCAT T T GG C AT T T GAAT ACAAT C T GGT GA 142 9 

M II I I II M II II II II I I I I I I I I I II I I II I I Ml 

Db 33577 AA- AC AGAT AGAT T T GT AGCAAGAAAG C CAC AGGT T T GAT AT T AGAAT GAAAT CT AAT G A 33635 

Qy 14 3 0 CTTGTCTGGCT GC CAAT AGAAC CT AGT AC CAAAGT GAAAT CTT GAGGAAAAT C C C T G GAA 1489 

II I M I II I I II I I I II I I I I II I I I I I I I I I I I I I I II I I I I I I 

Db 33636 — T GTCTGACTGTTAATAGAACCTGCCAC CAAT GT GAAAT CTATAGCAAGAT-C CTT GAA 33692 

Qy 1490 AGAGTGGAAAGTCCTGCCTAACACGTAAGTGCCTTCTT 1527 

I I I I I II I II I I I I I I I I II I I II I II I I 
Db 33693 AGAT T AT AAAAT C C T G C C T AAC AC AT AC G T GAAT T CAT 33730 



RESULT 25 
AX747300 

LOCUS AX747300 2512 bp mRNA linear PAT 20-JUN-2003 

DEFINITION Sequence 825 from Patent EP1308459. 

ACCESSION AX7 47 300 



VERSION AX747300.1 GI: 32131688 

KEYWORDS 

SOURCE Homo sapiens (human) 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 

AUTHORS Isogai,T., Sugiyama,T., Otsuki,T., Wakamatsu,A. , Sato,H., Ishii,S. 

Yamamoto, J. I . , Isono,Y. f Hio, Y. , Otsuka,K., Nagai,K., Irie,R., 
Tamechika, I . , Seki,N., Yoshikawa, T . , Otsuka,M. , Nagahari,K. and 
Masuho, Y. 

TITLE Full-length cDNA sequences 

JOURNAL Patent: EP 1308459-A 825 07-MAY-2003; 

Helix Research Institute (JP) ; Research Association for 
Biotechnology (JP) 
FEATURES Location/Qualifiers 
source 1. .2512 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 

ORIGIN 

Query Match 13.7%; Score 215; DB 6; Length 2512; 

Best Local Similarity 54.5%; Pred. No. 5e-55; 

Matches 576; Conservative 0; Mismatches 450; Indels 31; Gaps 6 

Qy 237 AAGGAC AC T CT GG CT AAAG GT ACAT CAGAT AAT G GC AT C GT T GG C CAAAT T G GT GAACT G 2 96 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I II I I I III 
Db 1 AAGGAC GC GCT GGCT AAAGGT AC AT CAGAT AAT GGTCTCCGTGGCCAAGTCCCATTCCTG 60 

Qy 2 97 TTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTC 356 

I I I I I I I II I I I I I I II I Mill I I I I II I 

Db 61 CTGTCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCC 12 0 

Qy 357 AGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTG 416 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 121 AACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTG 180 

Qy 417 CT AG C CAT G G GT GAGCT GCCCTTTCT GAGT C C AGAGG GAGC CAGAGGGC CT C ACAT CAAC 476 

I I I I I I I I I I I I I I I I I I I I III II III III II I Ml 
Db 181 TTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAAC 240 

Qy 477 AGAGGGTCTCT GAGCT CCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGC 536 

I I II I II I M II II II II I M I I I I III I I I III II I I II I I 
Db 241 AGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGC 2 97 

Qy 537 TTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTA 596 

III M II II I I I II I II I M II II I II II I I II III 
Db 298 CTGGGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGGCAGAGCCC TTGCTG 34 9 

Qy 597 GGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTC 656 

II II II II I I I I I I I I I 

Db 350 CTGCTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCACTC 409 

Qy 657 C AG CAGAT C AG G GT G AAAGT G GAC AGT CT GT AAC AAC AGT GAGT CGTTCCTCCTCCTCCT 716 

I I I I I I II I I I I I III 

Db 410 TTTAAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGGAG 4 69 



Qy 717 CCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTC 77 6 

I I I I I I I I I I I I I I I I M M Ml I I I I 

Db 47 0 AGGGAGAAGGGCTGTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGCCCA 529 

Qy 777 TCACTGATTTCTGCTCTCCCCTTCCTTGACTC-GCCCACCACCTGTCCTGTGTAGATGGA 8 35 

| | I I II I 1 I I I I I I I I I I I I I 

Db 530 GGGAAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTACGGC 58 9 

Qy 836 GAAGGCTC GGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGA 88 9 

I | I I I I II I 1 I I I I I I 1 II I I I II I I I I I I I 

Db 590 CTGCCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGGTGA 64 9 

Qy 8 90 AGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTGCAGGTGTACCATCTCCCAGTCAG 949 

I I I I I I I II I I II I I I I I M I I I I I 

Db 650 AAGAAT GCAGGG AC AG C CAC CT C GC AGC C AAAC G GACAGGAC AT T CAGAGCAAC 703 

Qy 950 AGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTC 1009 

II I I III II II III I I I II I I I I I 

Db 704 TCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTTCT- 762 

Qy 1010 ATTATACCTCCAAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCTGTC 10 69 

I I Mill I I I I II I II I 

Db 7 63 ACAGAGGAGGGCGCAGAGACTGAAACACGTTAGGAGCCTGTCCGGAGACTACTG 816 

Qy 107 0 CAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTA 1129 

III II II I I I I II I I I I I I II II I I I I I I I I I I I I I 

Db 817 GGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCCCCA 876 

Qy 1130 CTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACA 1189 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I M I I II III 
Db 877 AGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGGACA 936 

Qy 1190 T CAAAT C AT GC C AGC AGAAGT GG GAC AG GCAAAT C C T C AAAGAT GT CT C C T T GT ACAT C G 1249 

I M III I I I I MM I I I I I I I II II II M II II I I I I I I I I I I I I II I I I I 
Db 937 T CAC AT CTTGCCGG C AGC AGT GGAC CAGGC AGAT C CT CAAAGAT GT C T C CT T GT AC GT G G 996 

Qy 1250 AGAGT GGC CAGAT TAT GT G CAT C T T AG GCAG CT C AGG 1286 

I I I I II I I I I I I II II I I I I II II II II I II I 
Db 997 AGAG C G G G CAGAT CAT GT GC AT C C TAG GAAGC T C AG G 1033 



RESULT 2 6 

AK091997 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AK091997 2512 bp mRNA linear PRI 15-JUL-2002 

Homo sapiens cDNA FLJ34678 fis, clone LIVER2003065 . 

AK091997 

AK091997. 1 GI: 217504 90 

oligo capping; fis (full insert sequence) . 
Homo s apiens ( human ) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Ninomiya,K., Wagatsuma,M. , Kanda,K., Kondo,H., Yokoi,T., 
Kodaira,H., Furuya,T., Takahashi , M. , Kikkawa,E., Omura,Y., Abe,K., 
Kamihara,K., Katsuta,N., Sato,K., Tanikawa,M., Yamazaki,M., 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



Sugiyama, T . , Irie,R., Otsuki,T., Sato,H., Ota,T., Wakamatsu,A. , 
Ishii,S., Yamamoto, J. , Isono,Y., Kawai-Hio, Y . , Saito,K., 
Nishikawa, T. , Kimura,K., Yamashita, H . , Matsuo,K., Nakamura,Y., 
Sekine,M., Kikuchi,H., Murakawa,K. , Kanehori,K., 

Takahashi-Fujii,A. , Oshima,A., Sugiyama, A. , Kawakami, B . , Suzuki, Y . , 

Sugano,S., Nagahari,K., Masuho,Y., Nagai,K. and Isogai,T. 

NEDO human cDNA sequencing project 

Unpublished 

2 (bases 1 to 2512) 

Isogai,T. and Yamamoto,J. 

Direct Submission 

Submitted ( 04- JUL-2002 ) Takao Isogai, FLJ Project(HRI Team); 2-6-7 
Kazusa-Kamatari, Kisarazu, Chiba 292-0812, Japan 

(E-mail: genomics@hri.co.jp, Tel : 81-438-52-3975, Fax : 81-438-52-398 6 ) 
NEDO human cDNA sequencing project supported by Ministry of 
Economy, Trade and Industry of Japan; cDNA full insert sequencing: 
Research Association for Biotechnology (RAB) ; cDNA library 
construction: Helix Research Institute (HRI) (supported by Japan 
Key Technology Center etc.); 5 1 - & 3 ! -end one pass sequencing: RAB, 
HRI, and Biotechnology Center, National Institute of Technology and 
Evaluation; clone selection for full insert sequencing: HRI and 
RAB; annotation: HRI and RAB. 

Location/Qualifiers 

1. .2512 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone="LIVER2 003065" 
/ tissue_type=" liver" 
/clone_lib="LIVER2" 
/note="cloning vector: pME18SFL3" 



ORIGIN 



Query Match 13.7%; Score 215; DB 9; 

Best Local Similarity 54.5%; Pred. No. 5e-55; 
Matches 576; Conservative 0; Mismatches 450; 



Length 2512; 
Indels 31; 



Gaps 



6; 



QY 



Db 



237 AAG GAC AC T C T GG C T AAAGGT ACAT C AGATAAT G GC AT C GT T GG C CAAAT T GGT GAAC T G 296 

I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I III 
1 AAGGACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCATTCCTG 60 



QY 
Db 

QY 
Db 

QY 
Db 

QY 
Db 



2 97 TTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTC 35 6 
I I I II I I II I I I II I II I II III I I I I I I I 

61 CTGTCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCC 12 0 

357 AGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTG 416 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

121 AACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTG 18 0 

417 CT AGC C AT GGGT GAG CT GCCCTTTCT GAGT C C AGAG G GAG C C AGAG G G C C T C ACAT C AAC 4 7 6 

I I I I I I I I I I I I I 1 I I I I I I III II III III II I I I I 

181 TTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAAC 24 0 

47 7 AGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGC 536 

Mill II I M I I I I I I I I I I I I I I I Ml I I I III II II I II I 

241 AGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGC 2 97 



Qy 537 TTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTA 596 

III I I I I I I I I I I I I I I I I I I I I I M I I I I I II III 
Db 298 CTGGGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGGCAGAGCCC TTGCTG 349 

Qy 597 GGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTC 656 

II II II I I I I I I I I I I I 

Db 350 CTGCTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCACTC 4 09 

Qy 657 CAGCAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCT 716 

I I I I I I II I I I I I I I I 

Db 410 TTTAAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGGAG 4 69 

Qy 717 CCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTC 776 

I I I I I I I I I I I I I I I I II II III I I I I 

Db 47 0 AGGGAGAAGGGCTGTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGCCCA 52 9 

Qy 777 TCACTGATTTCTGCTCTCCCCTTCCTTGACTC-GCCCACCACCTGTCCTGTGTAGATGGA 835 

I I I I II I I I I I I I I I I I I I I I 

Db 530 GGGAAGGGCCTGGGCATAAACT^AAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTACGGC 589 

Qy 836 GAAGGCTC GGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGA 8 89 

III I I I I I I I I I I I I I I III I I I I I I I I II I 

Db 590 CTGCCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGGTGA 64 9 

Qy 8 90 AG GAAT G C AG G GT T C AC T T C AAGAAGAAAG C AGT GT G C AG GT GT AC CAT C T C C C AGT C AG 94 9 

I I I I I I I I I I I II I I I I I II I I I I I 

Db 650 AAGAATGCAGGG ACAGC CACCT C GCAGCCAAACGGACAGGACATT CAGAGCAAC 703 

Qy 950 AGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTC 1009 

II I I I I I I I I I I I I I I I I I I I I I I 

Db 704 TCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTTCT- 7 62 

Qy 1010 ATTATACCTCCAAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCTGTC 1069 

I I I I II I I I I I II I II I 

Db 7 63 ACAGAGGAGGGC GC AGAGAC T GAAACACGT TAG GAG C CT GT C C GGAGACT AC T G 816 

Qy 107 0 CAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTA 1129 

III II II I I I I I I I I I I I I I II I I I I I I I I II I I I I 

Db 817 GGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCCCCA 876 

Qy 1130 CTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACA 1189 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I III 
Db 877 AGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGGACA 936 

Qy 1190 T CAAAT CAT G C CAGCAGAAGT GGG AC AGG C AAAT C CT CAAAGAT GT CT C C T T GT ACAT C G 124 9 

III III I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 937 TCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGG 996 

Qy 1250 AGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGG 12 86 

I I I I II Mill I I I I I I I I I I I I I I I I I I II I 
Db 997 AGAGC G G GCAGAT CAT GT GC AT C C T AGGAAG C T C AG G 1033 

RESULT 27 
AX320881 

LOCUS AX320881 2258 bp DNA linear PAT 14-DEC-2001 

DEFINITION Sequence 2 from Patent WO0179272. 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 

FEATURES 

source 



AX320881 
AX320881. 1 



CDS 



GI: 17902431 



(house mouse) 



Mus musculus 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Tian,H., Schultz,J. and Shan,B. 

Sitosterolemia susceptibility gene (ssg) : compositions and methods 
of use 

Patent: WO 0179272-A 2 25-OCT-2001; 
Tularik Inc. (US) 

Location/ Qualifiers 

1. .2258 

/organism="Mus musculus" 
/mol_type="unas signed DNA" 
/db_xref="taxon: 10090" 

/note="mouse sitosterolemia susceptibility gene (SSG)" 
47. .2005 

/note="unnamed protein product; mouse sitosterolemia 

susceptibility gene (SSG) protein" 

/codon_start=l 

/protein_id="CAD19408 . 1" 

/db_xref="GI : 17902432" 

/ db_x r e f = " REMT REMBL : CAD 194 08" 

/translation="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 
SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 
RLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 
SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLD 
EPTTGLDCMTANQIVLLLAELARRDRIVIWIHQPRSELFQHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESD 
IYHKILENIERARYLKTLPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 
QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN 
FLILYGFI PALVI LGI VI FKVRDYLISR" 



ORIGIN 



Query Match 12.2%; 
Best Local Similarity 97.0%; 
Matches 195; Conservative 



Score 191.4; DB 6; 
Pred. No. 1.2e-47; 
0 ; Mismatches 6 ; 



Length 2258; 
Indels 0; 



Gaps 



0; 



Qy 



Db 



37 8 GGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCC 437 
I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I | I I | I I I I M I 
2 GGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCC 61 



QY 
Db 

Qy 

Db 



438 TTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTG 4 97 
I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I I I I 
62 TTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTG 12 1 

4 98 GAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCC 557 

M I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I t I I I I I I I I I 
122 GAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCC 181 



QY 



558 TACAGCGTCAGGTAAGGGGAC 57 8 



Db 



I I I I I I I I I I I I III 
182 TACAGCGT CAGCAACCGTGT C 202 



RESULT 2 8 

AF404107/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



gene 
mRNA 

CDS 



exon 



misc feature 



AF404107 581 bp DNA linear PRI 14-AUG-2001 

Homo sapiens sterolin 1 (ABCG5) and sterolin 2 (ABCG8) genes, 
partial cds . 
AF404107 

AF404107. 1 GI : 15150318 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 581) 

Lu,K., Lee,M.H., Hazard, S., Brooks-Wilson, A. , Hidaka,H., Kojima,H., 
Ose,L., Stalenhoef , A. F. , Mietinnen, T . , Bjorkhem, I . , Bruckert,E., 
Pandya,A., Brewer, H.B. Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel,S.B. 

Two genes that map to the STSL locus cause sitosterolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 

sterolin-2, encoded by ABCG5 and ABCG8, respectively 

Am. J. Hum. Genet. 69 (2), 278-290 (2001) 

21344600 

11452359 

2 (bases 1 to 581) 

Lu,K., Lee,M. and Patel,S.B. 

Direct Submission 

Submitted ( 11- JUL-2001) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403 

Location/Qualifiers 

1. .581 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 9606" 
complement (<1 . . >14 3) 
/gene="ABCG5" 
complement (<1 . ,>143) 
/gene="ABCG5" 
/product="sterolin 1" 
complement (<1 . .143) 
/gene="ABCG5" 
/ codon_start=l 
/product="sterolin 1" 
/protein_id="AAK85388 .1" 
/db_xref="GI : 15150319" 

/ translation="MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 
YSV" 

complement (<1 . .>143) 
/gene="ABCG5" 
/number=l 
144. .518 

/note="contains 5'UTR and promoter regions for ABCG5 and 
ABCG8" 



gene <519. .>581 

/gene="ABCG8" 
mRNA <519. .>581 

/gene="ABCG8" 

/product="sterolin 2" 
CDS 519. .>581 

/gene="ABCG8" 

/ codon_start=l 

/product="sterolin 2" 

/protein_id="AAK85389. 1" 

/db_xref="GI : 15150320" 

/ trans lation =,, MAGKAAEERGLPKGATPQDTS " 
exon <519. .>581 

/gene="ABCG8" 

/ number=l 

ORIGIN 

Query Match 11.5%; Score 179.8; DB 9; Length 581; 

Best Local Similarity 65.5%; Pred. No. 4.6e-44; 

Matches 294; Conservative 1; Mismatches 148; Indels 6; Gaps 2; 

Qy 12 0 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

I I I I I I I I I I I I I I I I I I I I III I I II I I I I I I I I I I I II 

Db 443 CTCTGTTTCTTGGAGCAGGGACACCTCGGCCTCCTGCCCTGGGCCCGTCTCTCCCAGCAT 38 4 

Qy 180 TCCTYTCTGGC7\AACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 239 

I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 

Db 383 TCCTTGCTGGCAAGCCCACC TACAAACGTGTGTGTTCTTGCCCACTGTCAAGATAAG 327 

Qy 24 0 GACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTA 2 99 

I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I II I I I I I II I I I M I 

Db 32 6 GACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCAGTCCTGCTG 267 

Qy 300 TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 35 9 

I I I I I I II I I I I I I I I I I I I II I I II I I I I 

Db 2 66 TCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAAC 207 

Qy 360 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 

I I I I I I I I I I I I I I I I II I I I II I I I M I I I I I I I I I I I I 

Db 206 TGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTG 147 

Qy 420 GC CAT G GGT GAGC T GCCCTTTCT GAGT C C AGAG GGAG C C AGAG G G C CT C AC AT CAACAGA 479 

I I I I I I I I I I I I I I I I I I I III II III III II I I I I I I I 
Db 14 6 GCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGA 87 

Qy 480 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 53 9 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 8 6 GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTG 30 

Qy 540 GGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

II MM I I I I I II I I I I I I I I I II 
Db 29 GGCATCCTCCATGCCTCCTACAGCGTCAG 1 
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NOTE: This record contains 83 individual 
sequencing reads that have not been assembled into 
contigs . Runs of N are used to separate the reads 
and the order in which they appear is completely 
arbitrary. Low-pass sequence sampling is useful for 
identifying clones that may be gene-rich and allows 
overlap relationships among clones to be deduced. 
However, it should not be assumed that this clone 



will be sequenced to completion. In the event that 
the record is updated, the accession number will 
be preserved. 
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Best Local Similarity 64.2^ 
Matches 292; Conservative 



Pred. No. 3.6e-42; 
1; Mismatches 157; Indels 5; Gaps 



2; 



Qy 12 0 CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 17 9 

I I I I I I I I I II I I I I I I I I III I I II I I I I I I I I I I I I I 

Db 34526 CTCTGTTTCCTGGAGCAGGGACACCTCGGCCTCCTGCCCTGGGCCCGTCTCTCCCAGCAT 34585 

Qy 18 0 TCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 239 

I I I I : I I I I I I I I I I II I II I I I I I I I I I I I II I I II I I I I 

Db 34586 TCCTTGCTGGCAAGCCCACCTACAAACGT GT GT GTT CT GC C CACT GT CAAGATAAG 34641 



Qy 

Db 

Qy 

Db 



24 0 GAC AC T C T G GCT AAAGGT AC AT C AGAT AAT GG CAT C GT T G G C C AAAT T G GT GAACT GT T A 2 99 
I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II II I I I I I I I I II I 

34 642 GACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCAGTCCTGCTG 34701 



300 



359 



TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 
I I II I I II I I I I I I II I I I I II I I I I I I I I 

347 02 TCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAAC 347 61 



Qy 

Db 

Qy 

Db 



3 60 TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 419 
I I I I I I I I I I I I I I I I I I I I I I II II I I II I I I I I I I I I 

34 762 TGAAGCCACTCTGGGGAGGGTCCGGCCACCAAAAAATTTGCCCAGCTTTGCTGCCTGTTG 34 821 



420 



479 



G C CAT G G GT GAG CT G C C CTT T CT GAGT C C AGAG G GAGC CAGAGG G C C T C AC AT C AAC AGA 
I I I I I I I I I I I I I I I I I I I III II III III II I I I I I I I 
34822 GCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGA 34 8 81 



Qy 480 GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 539 

I I I I I I II I I I I I I I I II I I I I II I I I I I I I I 

Db 34882 AGCTCCCAGAGCTCCCTGG-GAGGGGGGCTCCTGCCACCGCCCCGGAGCCTCACAGCCTG 34940 

Qy 54 0 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGG 574 

II I I I I I I I I I I I I I I I I I II I I I II I I 
Db 34 941 GGCATCCTCCATGCCTCCTACAGCGTNAAGTAAGG 34 975 



RESULT 30 

AF312714 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 
FEATURES 

source 



AF312714 2470 bp mRNA linear ROD 26-AUG-2002 

Rattus norvegicus sterolin (Abcg5) mRNA, complete cds . 

AF312714 

AF312714. 3 GI : 22477143 

Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 2470) 

Lee,M.H., Lu,K., Hazard, S., Yu,H., Shulenin, S . , Hidaka,H., 
Kojima,H., Allikmets , R. , Sakuma,N., Pegoraro,R., Srivastava, A. K. , 
Salen,G., Dean,M. and Patel,S.B. 

Identification of a gene, ABCG5, important in the regulation of 

dietary cholesterol absorption 

Nat. Genet. 27 (1), 79-83 (2001) 

20578753 

11138003 

2 (bases 1 to 2470) 

Lu,K., Lee,M. -H . and Patel,S.B. 
Direct Submission 

Submitted ( 12-OCT-2000 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 

3 (bases 1 to 2470) 

Lu,K., Lee,M. -H . and Patel,S.B. 
Direct Submission 

Submitted ( 16-MAY-2001 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 
Sequence update by submitter 

4 (bases 1 to 2470) 
Lu,K., Lee,M. and Patel,S.B. 
Direct Submission 

Submitted (26-AUG-2 002 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 
Sequence update by submitter 

On Aug 26, 2002 this sequence version replaced gi : 14091945. 
Location/ Qualifiers 
1. .2470 

/organism="Rattus norvegicus" 
/mol_type="mRNA" 
/strain="Sprague-Dawley" 
/db xref="taxon: 10116" 



/ tissue_type="liver " 
gene 1- .2470 

/gene="Abcg5" 
CDS 65. .2023 

/gene="Abcg5" 

/note="ABCG5" 

/ codon_start-l 

/product="sterolin" 

/protein_id="AAG53098 . 3" 

/db_xref="GI: 22477144" 

/translation="MSELPFLSPEGARGPHNNRGSQSSLEEGSVTGSEARHSLGVLNV 
SFSVSNRVGPWWNIKSCQQKWDRKILKDVSLYIESGQTMCILGSSGSGKTTLLDAISG 
RLRRTGTLEGEVFVNGCELRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMLALRSS 
SADFYDKKVEAVLTELSLSHVADQMI GNYNFGGI S S GERRRVS IAAQLLQDPKVMMLD 
EPTTGLDCMTANHIVLLLVELARRNRIVIVTIHQPRSELFHHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLESAFRQSD 
ICHKILENIE RT RH L KT L PMVP FKT KN P P GMFC K L GVL L RRVT RN LMRN KQWI MRL V 
QNLIMGLFLIFYLLRVQNNMLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYQKWQMLLAYVLHALPFSIVATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGMVQNPNIVNSIVALLSISGLLIGSGFIRNIEEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSVPNNPMCSMTQGIQFIEKTCPGATSRFTTN 
FLILYSFIPTLVILGMWFKVRDYLISR" 

ORIGIN 

Query Match 11.1%; Score 173.6; DB 10; Length 2470; 

Best Local Similarity 86.8%; Pred. No. 4.2e-42; 

Matches 191 ; Conservative 0 ; Mismatches 2 9 ; Indels 0 ; Gaps 0 ; 

Qy 359 TTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCT 418 

I I I I I I ill III I I I I I I I I I I III III I I I I I I I I I I I I I I I I I I III 
Db 1 T TAAAGT T GC T CT GAAG C C AGAC AG GAC AC C AGAG GAT T CAC T C AC AT T T GCT T C C C GC T 60 

Qy 419 AGC CAT G GGT GAGC T GC C CT T T C T GAGT C C AGAGGGAG C CAGAG GGC C T CAC AT C AAC AG 478 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II 
Db 61 GGC CAT GAGT GAG CT GC C CT T T CT GAGT C C AGAGGGAG C C AGAGG G C CT C AC AACAAC AG 12 0 

Qy 479 AGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTT 53 8 

I I I I I I I I I I I I I I I II I I I I I I I I II II II III I I I I I I I I I I I I I I I I I I I 
Db 121 AGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTCAGAGGCTCGGCACAGCTT 180 

Qy 539 AGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 

I I II I I II I I I I I I I I I I I I I II I I I I I I III 
Db 181 AGGTGTCCTGAATGTGTCCTTCAGCGTCAGCAACCGTGTC 22 0 



RESULT 31 

AY196216/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



AY196216 2284 bp mRNA linear ROD 01-JUN-2003 

Mus musculus strain PERA/Ei ATP-binding cassette sub-family G 
member 8 (Abcg8) mRNA, complete cds . 
AY196216 

AY196216. 1 GI: 313222 61 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 



REFERENCE 1 (bases 1 to 2284) 

AUTHORS Wittenburg,H. , Lyons, M. A., Li,R., Churchill, G. A. , Carey, M.C. and 
Paigen, B. 

TITLE Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
Mice 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 2284) 

AUTHORS Lyons, M. A., Wittenburg, H . , Walsh, K. A., Carey, M.C. and Paigen, B. 
TITLE Direct Submission 

JOURNAL Submitted ( 12-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 
FEATURES Location/Qualifiers 
source 1. .2284 

/ organism="Mus musculus " 

/mol_type= "mRNA" 

/strain="PERA/Ei" 

/db_xref="taxon: 10090" 

/ chromosome^ " 1 7 " 

/map-" 55 cM" 

/sex="male" 

/tissue_type=" liver" 
gene 1. .2284 

/gene="Abcg8" 
CDS 102. .2120 

/gene="Abcg8" 

/note-"ATP-dependent canalicular cholesterol transporter; 
white subfamily" 
/ codon_start=l 

/product-"ATP-binding cassette sub-family G member 8" 
/protein_id="AAO45096 . 1" 
/db_xref="GI : 31322262 " 

/ trans la tion="MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLA 
IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKER 
EVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELP 
GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAAL 
LFMI GALI P FNVI LDWSKCHS ERSML YYELEDGLYTAGPYFFAKI LGELPEHCAYVI 
IYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFHMSSFFCNALY 
NSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSIL 
GDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 

ORIGIN 



Query Match 10.4%; Score 164; DB 10; Length 2284; 

Best Local Similarity 100.0%; Pred. No. 4.2e-39; 

Matches 164; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I II I I I I I I I 
Db 164 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 105 



Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I II I I I I I I II I I I I I I I II I I I I I I I 
Db 104 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 45 



Qy 121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAG 164 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 11 I I I I I I 
Db 44 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAG 1 



RESULT 32 

AY196215/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 



JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



gene 
CDS 



AY196215 2285 bp mRNA linear ROD 01-JUN-2003 

Mus musculus strain 1/LnJ ATP-binding cassette sub-family G member 
8 (Abcg8) mRNA, complete cds . 
AY196215 

AY196215. 1 GI: 31322259 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chorciata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 2285) 

Wittenburg, H . , Lyons, M. A., Li,R., Churchill, G .A. , Carey, M.C. and 
Paigen, B. 

Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 
Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
Mice 

Unpublished 

2 (bases 1 to 2285) 

Lyons, M. A., Wittenburg, H . , Walsh, K. A., Carey, M.C. and Paigen, B. 
Direct Submission 

Submitted ( 12-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 

Location/ Qualifiers 

1. .2285 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="l/LnJ" 

/db_xref="taxon: 10090" 

/chromosome="17 " 

/map="55 cM" 

/sex="male" 

/tissue_type="liver" 

1. .2285 

/gene="Abcg8" 

102. .2120 

/gene="Abcg8" 

/note="ATP-dependent canalicular cholesterol transporter; 
white subfamily" 
/ codon_start=l 

/product="ATP-binding cassette sub-family G member 8" 
/protein_id= M AAO45095. 1" 
/db_xref="GI: 31322260" 

/ translation="MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLA 
IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKER 
EVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAAELP 



GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAAL 
LFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVI 
I YAMPIYWLTNLRPVPELFLLHLLLWLWFCCRTMALAASAMLPTFHMSSFFCNALY 
NSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSIL 
GDTMISAMDLNSHPLYAIYLIVTGISYGFLFLYYLSLKLIKQKSIQDW 11 

ORIGIN 

Query Match 10.4%; Score 164; DB 10; Length 2285; 

Best Local Similarity 100.0%; Pred. No. 4.2e-39; 

Matches 164; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 164 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 105 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 104 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 45 

Qy 121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAG 164 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 4 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAG 1 



RESULT 33 

AF324495/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



AF324495 3674 bp mRNA linear ROD 07-AUG-2001 

Mus musculus sterolin-2 (Abcg8) mRNA, complete cds . 

AF324495 

AF324495.1 GI:15088541 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 3674) 

Lu,K., Lee,M.H., Hazard, S., Brooks-Wilson, A. , Hidaka,H., Kojima,H., 
Ose,L., Stalenhoef , A. F. , Mietinnen, T . , Bjorkhem, I., Bruckert,E., 
Pandya,A. , Brewer, H.B. Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel,S.B. 

Two genes that map to the STSL locus cause sitos terolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 

sterolin-2, encoded by ABCG5 and ABCG8, respectively 

Am. J. Hum. Genet. 69 (2), 278-290 (2001) 

21344600 

11452359 

2 (bases 1 to 3674) 

Lu,K., Lee, M. -H . and Patel,S.B. 
Direct Submission 

Submitted (29-NOV-2000 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
Street, STB541, Charleston, SC 29403, USA 

Location/Qualifiers 

1. .3674 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain^"C57BL/6" 



/db_xref="taxon: 10090" 

/ tissue_type="liver " 
gene 1- .3674 

/gene="Abcg8" 
CDS 102. .2123 

/gene="Abcg8" 

/note="ABCG8 11 

/codon_start=l 

/product="sterolin-2" 

/protein_id="AAK8 4 07 9. 1" 

/db_xref="GI : 15088542 M 

/translation="MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQ 
SNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQML 
AIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLP 
NLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGE 
RRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSD 
I FRLFDLVLLMT S GT P I YLGAAQQMVQ YFT S I GHP CPRYSNPAD F YVDLT S I DRRS KE 
REVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVEL 
PGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAA 
LLFMI GALI PFNVT LDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYV 
IIYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNAL 
YNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSI 
LGDTMISAMDLNSHPLYAI YLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 

ORIGIN 

Query Match 10.4%; Score 164; DB 10; Length 3674; 

Best Local Similarity 100.0%; Pred. No. 4.3e-39; 

Matches 164; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I 1 I I I I M I I M I I I I I I I I I I I I I I I I I I I I 1 I I I ! I I I I I I I I I M I I I I I I I N I I 
Db 164 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 105 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 

I | I I I I | I I I I II II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 104 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 45 

Qy 121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAG 164 

I I I I I I I I I I I I I I I II I I I M I I II I I I I I II I I I I I I I I I I I 
Db 44 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAG 1 



RESULT 34 

BD223287 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



BD223287 226 bp DNA linear PAT 17-JUL-2003 

Toxicological response markers. 

BD223287 

BD2232 87. 1 GI : 33033057 

JP 2002523112-A/24. 

Rattus norvegicus (Norway rat) 

Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 226) 

Cunningham, M. J. , Zweiger , G . B . , Panzer, S.R. and Seilhamer, J.J. 
Toxicological response markers 



JOURNAL 



COMMENT 



FEATURES 

source 



ORIGIN 



09/172711 PR 
PANZER, JEFFREY J 



Patent: JP 2002523112-A 24 30- JUL-2002 ; 
INCYTE PHARMACEUTICALS INC 
OS Rattus norvegicus (rat) 
PN JP 2002523112-A/24 
PD 30-JUL-2002 
PF 27-AUG-1999 JP 2000567743 

PR 28-AUG-1998 US 09/ 14 182 5 , 13-OCT-l 9 98 US 
13-OCT-1998 US 09/172108 

PI MARY JANE CUNNINGHAM, GARY B ZWEIGER, SCOTT R 
PI SEILHAMER 

PC C12N15/09,C12Q1/68,G01N33/15, G01N33/5 0, G01N33/53, G01N33/566, 

PC G01N37/00, 

PC G01N37/00, C12N15/00 

CC Incyte template ID No: 700138117F6 
FH Key Location/Qualif iers 

FT source 1. .226 

FT /organism= 1 Rattus norvegicus (rat)'. 

Location/ Qualifiers 
1. .226 

/organism= M Rattus norvegicus" 
/mol_type="genomic DNA" 
/db xref="taxon: 10116" 



Query Match 9.6%; 
Best Local Similarity 87.8%; 
Matches 165; Conservative 



Score 151.2; DB 6; 
Pred. No. 3.8e-35; 
0; Mismatches 23; 



Length 226; 
Indels 0; 



Gaps 



0; 



Qy 



Db 



391 GAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAG 45 0 
II I II I II I I I I I I II II I I III MINI I II I I I 1 I II I I I I I I I I 1 I I I I I 
1 GAGGATTCACTCACATTTGCTTCCCGCTGGCCATGAGTGAGCTGCCCTTTCTGAGTCCAG 60 



QY 
Db 

Qy 

Db 

Qy 

Db 



451 AGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGG 510 
II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I I I II I I I I I II I 
61 AGGGAGCCAGAGGGCCTCACAACAACAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAG 120 



511 



570 



TCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGT 
I II III I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I 
121 TTACAGGCTCAGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGCA 180 

571 AAGGGGAC 57 8 

I I I I 
181 ACCGTGTC 18 8 



RESULT 35 

AR121818 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 



AR121818 
Sequence 
AR121818 
AR121818. 



235 bp 
from patent US 6160104. 

GI: 14105394 



DNA 



linear 



PAT 16-MAY-2001 



Unknown . 
Unknown . 
Unclassified. 
1 (bases 1 to 



235) 



Cunningham, M. Jane. , Zweiger , G . B . , Panzer, S.R. and Seilhamer , J . J. 



TITLE Markers for peroxisomal prolif erators 

JOURNAL Patent: US 6160104-A 8 12-DEC-2000; 
FEATURES Location/Qualif iers 

source 1 . .235 

/ organism="unknown" 
/mol_type="unassigned DNA" 

ORIGIN 

Query Match 9.6%; Score 150.8; DB 6; Length 235; 

Best Local Similarity 90.4%; Pred. No. 5e-35; 

Matches 161; Conservative 0; Mismatches 17; Indels 0; Gaps 0; 

Qy 391 GAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAG 4 50 

II I I I I I I I I I I 1 I I I I I I I III I I I I II I I I I I I I I II I I I I I I I I I I I I I I 
Db 1 GAGGATTCACTCACATTTGCTTCCCGCTGGCCATGAGTGAGCTGCCCTTTCTGAGTCCAG 60 

Qy 451 AG G GAGC C AGAG GGC CT C AC AT C AAC AGAGG GT C T C T GAGCT C C C T G GAG CAAGGT T C G G 510 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 61 AG G GAG C C AGAG GGC CT CACAAC AAC AGAGG GT C T C AGAGCT C C C T G GAG GAAG G C T C AG 12 0 

Qy 511 TCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

I II III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 121 TTACAGGCTCAGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAG 17 8 



RESULT 36 

AX456523 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 



FEATURES 

source 



PAT 06-JUL-2002 



AX456523 1915 bp DNA linear 

Sequence 45 from Patent WO0227016. 

AX456523 

AX456523. 1 GI : 21715412 



synthetic construct 
synthetic construct 
artificial sequences . 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 45 04-APR-2002; 

THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Patel, 
Shailendra B. (US) ; Dean, Michael (US) 

Location/Qualif iers 

1. .1915 

/ organism^" synthetic construct" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 32630" 
/note="Primer" 



ORIGIN 



Query Match 9.3%; 
Best Local Similarity 96.2%; 
Matches 150; Conservative 



Score 146.4; DB 6; 
Pred. No. 1.3e-33; 
0; Mismatches 6; 



Length 1915; 
Indels 0; 



Gaps 



0; 



Qy 



Db 



423 AT GGGT GAGCT GCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 4 82 

I II I I I I I I I I I 11 I I I I I I I I I I I I I II I 1 I II I I I I I I II I I I I I I I I I I I I I I I I I I 
1 AT GGGT GAGCT GCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 60 



Qy 483 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 542 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I 
Db 61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 12 0 

Qy 543 GTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I 
Db 121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 156 



RESULT 37 

AX685729 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



AX685729 
Sequence 1 
AX685729 
AX685729.1 



CDS 



1959 bp DNA 
from Patent WO02081691. 

GI:29371738 



linear 



PAT 29-MAR-2003 



Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 

Hobbs , H . H . , Shan,B., Barnes, R. and Tian,H. 
Abcg5 and abcg8: compositions and methods of use 
Patent: WO 02081691-A 1 17-OCT-2002; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 
(US) 

Location/ Qualifiers 
1. .1959 

/organism="Mus musculus" 
/mol_type="unas signed DNA" 
/db_xref="taxon: 10090" 
1. .1959 

/note="unnamed protein product; ABCG5 (mABCG5 ) " 
/codon_start=l 
/protein_id="CAD86570. 1" 
/db_xref="GI: 29371739" 
/ db_x r e f = " REMTREMBL : CAD 8 6570" 

/ translation="MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHV 
SYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG 
RLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS 
SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLD 
EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCG 
TPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESD 
I YHKI LENI ERARYLKTLPMVP FKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLV 
QNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVS 
DQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALL 
APHLIGEFLTLVLLGIVQNPNIA/NSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYF 
TFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN 
FLILYGFIPALVILGIVIFKVRDYLISR" 



ORIGIN 



Query Match 9.3%; 
Best Local Similarity 96.2%; 
Matches 150 ; Conservative 



Score 146.4; DB 6; 
Pred. No. 1.3e-33; 
0; Mismatches 6; 



Length 1959; 
Indels 0; 



Gaps 



0; 



Qy 423 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 482 



Db 

Qy 

Db 

Qy 
Db 



I I I I 1 I I I I I 1 I I I I I I I i I I I I I I I I II I I I I I I I II I I I I I I I I I I 1 M I I I I I II II 

1 AT GGGT GAG CT GCCCTTTCT GAGT C C AGAGG GAGC C AGAGG GCCT C ACAT CAAC AGAG G G 60 

4 83 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 542 

II I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 12 0 

543 GTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 57 8 

I I I I I I I I 1 I II I I I I I I I I I I I I I I I I I I 
121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 156 



RESULT 38 

AC084712/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



AC084712 68166 bp DNA linear HTG 08-NOV-2000 

Homo sapiens chromosome 2 clone RP11-328I4 map 2, LOW-PASS SEQUENCE 
SAMPLING. 
AC084712 

AC08 4712 .1 GI : 11120851 
HTG; HTGS_PHASE0. 
Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 68166} 

Birren,B., Linton, L., Nusbaum, C. and Lander, E. 
Homo sapiens chromosome 2, clone RP11-328I4 
Unpublished 

2 (bases 1 to 68166) 

Birren,B., Linton, L., Nusbaum, C, Lander, E., Abraham, H., Allen, N., 
Anderson, S., Barna,N., Bastien,V., Beda,F., Boguslavkiy, L . , 
Boukhgalter, B . , Brown, A., Burkett,G., Campopiano, A. , Castle, A., 
Choepel,Y., Colangelo,M. , Collins, S., Collymore,A. , Cooke, P., 
DeArellano, K. , Dewar,K., Diaz, J. S., Dodge, S., Ferreira,P., 
FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., Ginde,S., Goyette,M., 
Graham, L., Grand-Pierre, N . , Hagos,B., Heaf ord,A. , Horton,L., 
Iliev, I., Johnson, R. , Jones, C, Kann,L., Karatas,A., LaRocque,K., 
Lamazares, R. , Landers, T., Lehoczky,J., Levine,R., Lieu,C, Liu, G. , 
Macdonald, P . , Marquis, N., McCarthy, M. , McEwan,P., McKernan,K., 
McPheeters, R. , Meldrim, J. , Meneus,L., Mihova,T., Mlenga,V., 
Morrow, J. , Murphy, T., Naylor,J., Norman, C.H., 0'Connor,T., 
O' Donnell, P. , O f Neil,D., 01ivar,T.M., Oliver, J., Peterson, K., 
Pierre, N., Pisani,C, Pollara,V., Raymond, C, Rieback,M., Riley, R. , 
Rogov, P., Rothman, D. , Roy, A. , Santos, R. , Schauer,S., Severy,P., 
Sougnez,C, Spencer, B., Stange-Thomann, N . , Sto j anovic, N . , 
Strauss, N., Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., 
Tirrell,A., Travers,M., Trigilio,J., Vassiliev, H . , Viel,R., Vo,A., 
Wilson, B . , Wu,X., Wyman,D., Ye,W.J., Young, G., Zainoun,J., 
Zimmer,A. and Zody,M. 
Direct Submission 

Submitted ( 08-NOV-2000 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : / / ftp . genome . Washington . edu/RM/ RepeatMasker . html 

Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 



Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi .mit . edu 

Project Information 

Center project name: L11395 
Center clone name: 328 I 4 



* NOTE: This record contains 83 individual 

* sequencing reads that have not been assembled into 

* contigs . Runs of N are used to separate the reads 

* and the order in which they appear is completely 

* arbitrary. Low-pass sequence sampling is useful for 

* identifying clones that may be gene-rich and allows 

* overlap relationships among clones to be deduced. 

* However, it should not be assumed that this clone 

* will be sequenced to completion. In the event that 

* the record is updated, the accession number will 



be preserved. 
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—l- * — 1 1 V4 L#ll 






49195 


gap o f 


100 bp 








* 


4 Q1 Q6 

*± _7 J. w? \J 


4 9939 


rnnhi rr 


of 744 


bn 

uy 


in 


length 


* 


4 9940 


50039 

\J \J ~s 


gap o f 


100 bp 










s no4 n 


5077 8 


ouii L- j_ y 


of 739 


bn 
uy 


in 


1 (^n rr t~ n 
_i_ tz: i ly i—ii 


* 


50779 


5087 8 


gap o f 


100 bp 










5 0879 


51619 


rnnl — 1 rr 


of 741 


bn 
uy 


in 


1 f=i n rr 1" n 
j-^iiy ^ii 


* 


51620 


51719 


gap o f 


100 bp 








* 


51720 


52438 


p on "h i rr 


of 719 


bn 
uy 


in 


1 pn rr t~ h 

_i_ * — < 1 1 V_J U 1 1 




52439 


52538 


gap o f 


100 bp 










52539 


53235 


rnn t" i rr 


of 697 


bn 

uy 


in 


1 r^n rr 1~ h 

X^llU Ull 




53236 


53335 
*j *j *j ^* 


gap o f 


100 bp 








* 


53336 


54028 


r*nn t* i rr 


of 693 


bn 

uy 


in 


1 e^n rrt*h 


* 


54029 


54128 


gap o f 


100 bp 








* 


54129 


54853 


r*on t~ i rr 

* — \j 1 1 _i_ y 


of 725 


bp 

uy 


in 


1 f^n rj"t~ h 

V — 1 1 \_A ^11 


+ 


54854 


54953 


gap o f 


100 bp 








* 


54954 


55679 


contig 


of 726 


bp 


in 


length 


* 


55680 


55779 


gap of 


100 bp 








* 


55780 


56519 


contig 


of 740 


bp 


in 


length 




56520 


56619 


gap of 


100 bp 










56620 


57358 


contig 


of 739 


bp 


in 


length 



Query Match 9.3%; 
Best Local Similarity 72.6%; 
Matches 207; Conservative 



Score 145.4; DB 2; Length 68166; 
Pred. No. 3.1e-33; 
0; Mismatches 66; Indels 12; Gaps 



1; 



Qy 1019 CCAAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAA 1078 

I I I I I I I I I I I I I I I II III I I I I I I II I I I I 

Db 10159 CCAGGTTCTACAGAGGAGGGCGCAGAGACTGAAACACGTTAGGAGCCTGTCCGGAGACTA 10100 

Qy 1079 CTGGGATGGGG TAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACC 1126 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II 

Db 10099 CTGGGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCC 10040 



Qy 



1127 CTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGA 118 6 



II 1 1 1 1 II 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 10039 CCAAGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGG 9980 

Qy 1187 AC AT CAAAT CAT G C CAG CAGAAGT G G GACAG G CAAAT C C T CAAAGAT GT C T C C T T GT AC A 1246 

MINI III MM II II I I I II II I I I I II II I I I I I II II I I I 1 I I I I I I I 
Db 9979 ACAT C AC AT CTTGCCGG C AGC AGT G GAC CAGGCAGAT C C T CAAAGAT GT C T C C T T GT AC G 9920 

Qy 1247 T CGAGAGT GGC C AGAT TAT GT GCAT CT T AGGCAGCT CAGGTAAGT 1291 

I I I I II II II I I I I II I I I II I I I I I II I I I I I I I II I I 
Db 9919 T GGAGAGC G GGC AGAT CAT GT G CAT C C TAG GAAGCT CAG GT AAGT 9875 



RESULT 39 

AX456526 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 



FEATURES 

source 



AX456526 2035 bp DNA linear PAT 06-JUL-2002 

Sequence 48 from Patent WO0227016. 

AX456526 

AX456526. 1 GI: 21715414 

synthetic construct 
synthetic construct 
artificial sequences. 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 48 04-APR-2002; 

THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Patel, 
Shailendra B. (US) ; Dean, Michael (US) 

Location/ Qualifiers 

1. .2035 

/ organism=" synthetic construct" 
/mol_type="unas signed DNA" 
/db_xref="taxon: 32630" 
/note="Pirmer" 



ORIGIN 



Query Match 8.6%; 
Best Local Similarity 89.6%; 
Matches 146; Conservative 



Score 135.8; DB 6; 
Pred. No. 2.7e-30; 
0; Mismatches 17; 



Length 2035; 
Indels 0; 



Gaps 



0; 



Qy 



Db 



416 G CT AG C CAT G GGT GAGC T GCCCTTTCT GAGT C CAGAG GGAG C C AGAG GGC C T C AC AT CAA 4 75 

Ml I I M I II I I I I I I I I I I I I II II I II II II I I II I I II II M II M II I M I III 

1 GCTGGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACAACAA 60 



Qy 



Db 



476 CAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAG 535 

I I I II M II I I I I I I I I I M I I I I II I I M II II III I I I I I I I I I I I I M II 

61 CAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAGTTACAGGCTCAGAGGCTCGGCACAG 12 0 



Qy 



Db 



53 6 CTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 

II II I I M I I I I I I I II I I I II II I II I II I I I I I 
121 CTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGCAACCGTGTC 163 



RESULT 40 
AX456520 

LOCUS AX456520 2516 bp DNA linear PAT 06-JUL-2002 



DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 



FEATURES 

source 



Sequence 42 from Patent WO0227016. 
AX456520 

AX456520. 1 GI : 21715410 

synthetic construct 
synthetic construct 
artificial sequences. 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 42 04-APR-2002; 

THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Patel, 
Shailendra B. (US) ; Dean, Michael (US) 

Location/ Qualifiers 

1. .2516 

/organism=" synthetic construct" 
/mol__type="unas signed DNA" 
/db_xref="taxon: 32630" 
/note="Primer" 



ORIGIN 



Query Match 6.8%; 
Best Local Similarity 63.6%; 
Matches 180 ; Conservative 



Score 107; DB 6; Length 2516; 
Pred. No. 2.6e-21; 
0; Mismatches 100; Indels 3; 



Gaps 



Qy 



Db 



2 94 C T GT TAT CT C AC G AGGAT T C C AGGGC T G GGT AG GAT C G GAC AGG GC AC T C CC ATT G G C T C 353 
I I I I I I I I I I II I I I II I II I I I I I I I I II I 

12 CTGCTGTCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTC 7 1 



Qy 

Db 



354 CTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTC 413 

III I I I I I I I I 1 I I I I I I II I I I II I I I I I I I I I I I I I I 

72 CCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGC 131 



Qy 

Db 

Qy 

Db 

Qy 

Db 



414 C T G CT AG C CAT G G GT GAG CT GCCCTTTCT GAGT C C AGAG G GAG C C AGAG GGC C T CAC AT C 

I I I I I 1 I I I I I I I I I I I I I I I I I III II III III II I 
132 CTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTA 

474 AACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCAC 

I I I I I II I II I I I I I I I I I I I I I I I I I I III I I I Ml II Ml 
192 AACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CAC 

534 AGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 57 6 

I I I I II I I I I I I I I I I I I I I I I II I I I I I II 
249 AGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 291 



473 



191 



533 



248 



RESULT 41 

AF312715 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



PRI 14-JUN-200 



AF312715 2740 bp mRNA linear 

Homo sapiens sterolin (ABCG5) mRNA, complete cds . 
AF312715 

AF312715 .2 GI : 14423628 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 
FEATURES 

source 



gene 
CDS 



Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 2740) 

Lee,M.H., Lu,K., Hazard, S., Yu,H., Shulenin,S., Hidaka,H., 
Kojima,H., Allikmets , R. , Sakuma,N., Pegoraro,R., Srivastava, A. K. , 
Salen,G., Dean,M. and Patel,S.B. 

Identification of a gene, ABCG5, important in the regulation of 

dietary cholesterol absorption 

Nat. Genet. 27 (1), 79-83 (2001) 

20578753 

11138003 

2 (bases 1 to 2740) 

Lu,K., Lee, M. -H . and Patel,S.B. 

Direct Submission 

Submitted ( 12-OCT-2000 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB541, Charleston, SC 29403, USA 

On Jun 14, 2001 this sequence version replaced gi : 12382303. 
Location/ Qualifiers 
1. .2740 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/ chromosome="2 M 

/map="2p21; between D2S2294 and D2S2298" 

/tissue_type-"liver" 

1. .2740 

/gene="ABCG5" 

141. .2096 

/gene="ABCG5" 

/ codon_start=l 

/product="sterolin" 

/protein_id="AAG53099. 1" 

/db_xref="GI: 12382304" 

/ trans la tion="MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 
YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 
LGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGN 
PGSFQKKVEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDE 
PTTGLDCMTANQI VVLLVELARRNRIVVLTIHQPRSELFQLFDKIAILSFGELIFCGT 
PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAI 
CHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQ 
NLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSD 
QESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLA 
PHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 
FQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 
LILYSFIPALVILGIWFKIRDHLISR" 



ORIGIN 



Query Match 6.8%; Score 107; DB 9; Length 2740; 

Best Local Similarity 63.6%; Pred. No. 2.6e-21; 

Matches 180; Conservative 0; Mismatches 100; Indels 3; 



Gaps 



1; 



Qy 

Db 

Qy 



2 94 CT GT T AT C T C AC GAG GAT T C C AG GG C T G G GT AG GAT C GGACAG GG C AC T C C CAT T GGC T C 353 

I I I I I I I I I I II I I I I I I II I I I I I I Mill 

12 CTGCTGTCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTC 71 

354 CTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGT^AAATTCACTTGCATTTGCTTC 413 
III I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I I I I I 



Db 



72 CCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGC 131 



Qy 414 CTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATC 473 

I i I I I I I I II I II I I I I I I I I I I III M III III II I 
Db 132 CTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTA 191 

Qy 474 AACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCAC 533 

I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 192 AACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CAC 24 8 

Qy 534 AGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 576 

I II I I I I I I I I I I I I I I I I II I I I I I I I I II 
Db 249 AGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 291 



RESULT 42 

AX320886 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 

FEATURES 

source 



AX320886 
Sequence 7 
AX320886 
AX320886.1 



249 bp 
from Patent WO0179272. 

GI:17902435 



DNA 



linear 



PAT 14-DEC-2001 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Tian,H., Schultz,J. and Shan,B. 

Sitosterolemia susceptibility gene (ssg) : compositions and methods 
of use 

Patent: WO 0179272-A 7 25-OCT-2001; 
Tularik Inc. (US) 

Location/Qualifiers 

1. .249 

/organism="Homo sapiens" 
/mol_type="unas signed DNA" 
/db_xref="taxon: 9606" 
/note="exon 1 of hSSG" 



ORIGIN 



Query Match 6.5%; Score 101.6; DB 6; 

Best Local Similarity 68.4%; Pred. No. 1.2e-19; 
Matches 156; Conservative 0; Mismatches 69; 



Length 249; 
Indels 3; 



Gaps 



l; 



Qy 



Db 



341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 4 00 

I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I II I I 
25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 8 4 



Qy 

Db 

Qy 

Db 



4 01 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 4 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II Ml 
8 5 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

461 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGT CAC GGGCAC 52 0 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 



Qy 



521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M 

Db 205 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAG 24 9 



RESULT 43 

AX320883 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 

FEATURES 

source 



CDS 



AX320883 2340 bp DNA linear PAT 14-DEC-2001 

Sequence 4 from Patent WO0179272. 

AX320883 

AX320883.1 GI:17902433 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominiclae; Homo. 

1 

Tian,H., Schultz,J. and Shan,B. 

Sitosterolemia susceptibility gene (ssg) : compositions and methods 
of use 

Patent: WO 0179272-A 4 25-OCT-2001; 
Tularik Inc. (US) 

Location/Qualifiers 

1. .2340 

/organism="Homo sapiens" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 9606" 

/note="human sitosterolemia gene (SSG) " 
107. .2062 

/note="unnamed protein product; human sitosterolemia 

susceptibility gene (SSG) protein" 

/ codon_start-l 

/protein_id="CAD19409. 1" 

/db_xref ="GI : 17902434 " 

/ db_x r e f = " REMT REMB L : CAD 19409" 

/ trans la tion="MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 
YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 
LGRAGTFLGEVYWGRALRREQFQDCFSWLQSDTLLSSLTVRETLHYTALIAIRRGN 
PGS FQKKVEAVMAELSLSHVADRLI GNYS LGGI STGERRRVS I AAQLLQDPKVMLFDE 
PTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGT 
PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAI 
CHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQ 
NLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSD 
QESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLA 
PHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 
FQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 
LILYSFIPALVILGIWFKIRDHLISR" 



ORIGIN 



Query Match 6.5%; 
Best Local Similarity 67.4%; 
Matches 159; Conservative 



Score 101.6; DB 6; 
Pred. No. 1.3e-19; 
0; Mismatches 74; 



Length 2340; 
Indels 3; 



Gaps 



l; 



QY 
Db 



341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 

Mill I I I I I I II I I I II I MM I I I I I II I II I I I II I M I 
25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 



400 



84 



QY 



4 01 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 460 



I II 1 1 1 1 1 1 1 I 1 1 1 1 1 1 11 1 1 1 1 1 I I I 111 III II III 

Db 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 

III II I I I I I I I I! I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 2 04 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 57 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 2 05 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 257 



RESULT 44 

AX685733 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



AX685733 2340 bp DNA 

Sequence 5 from Patent WO02081691. 
AX685733 

AX685733. 1 GI: 2 9371742 



linear 



PAT 29-MAR-2 0 03 



Homo sapiens (human) 
Homo sapiens 
Eukaryota; Metazoa; 
Mammalia; Eutheria; 
1 

Hobbs,H.H., Shan,B., Barnes, R. and Tian,H. 

Abcg5 and abcg8 : compositions and methods of use 
Patent: WO 02081691-A 5 17-OCT-2002; 
Tularik Inc. (US) 
(US) 



Chordata; Craniata; Vertebrata; Euteleostomi ; 
Primates; Catarrhini; Hominidae; Homo. 



BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 



FEATURES 

source 



CDS 



Location/ Qualifiers 
1. .2340 

/organism="Homo sapiens" 
/mol_type="unas signed DNA" 
/db_xref="taxon: 9606" 
107. .2062 

/note="unnamed protein product; human ABCG5 (hABCG5 ) " 

/codon_start=l 

/protein_id="CAD86572. 1" 

/db_xref="GI: 29371743" 

/ db_xref — "REMTREMBL : CAD8 6572 " 

/ trans la tion="MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 
YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 
LGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGN 
PGSFQKKVEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDE 
PTTGLDCMTANQIWLLVELARRNRIWXTIHQPRSELFQLFDKIAILSFGELIFCGT 
PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAI 
CHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQ 
NLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSD 
QESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLA 
PHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 
FQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 
LILYSFIPALVILGIWFKIRDHLISR" 



ORIGIN 



Query Match 6.5%; Score 101.6; DB 6; Length 2340; 

Best Local Similarity 67.4%; Pred. No. 1.3e-19; 

Matches 159; Conservative 0; Mismatches 74; Indels 3; Gaps 1; 



Qy 341 CTCCCATTGGCTCCTCAGTTT^AAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 4 00 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 8 4 

Qy 401 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 460 

I I I I I I I I I I I II I I II I I I I I II I I I I I I III II III 
Db 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 57 6 

II I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I II 
Db 205 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 257 



RESULT 45 

AF320293 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 



FEATURES 

source 



AF320293 

Homo sapiens ABCG5 
AF320293 

AF32 0293. 1 GI: 11692799 



gene 
CDS 



2340 bp mRNA linear 
ABCG5) mRNA, complete cds . 



PRI 13-DEC-2000 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 2340) 

Berge,K.E., Tian,H., Graf, G. A., Yu,L., Grishin, N . V. , Schultz,J., 

Kwiterovich, P . , Shan,B., Barnes, R. and Hobbs,H.H. 

Accumulation of Dietary Cholesterol in Sitosterolemia Caused by 

Mutations in Adjacent ABC Transporters 

Science (2001) In press 

2 (bases 1 to 2340) 

Berge,K.E., Tian,H., Graf, G. A., Yu,L., Grishin, N .V. , Schultz,J., 
Kwiterovich, P . , Shan,B., Barnes, R. and Hobbs , H . H . 
Direct Submission 

Submitted ( 09-NOV-2000 ) Molecular Genetics, University of Texas, 
Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., 
Dallas, TX 75390-9046, USA 

Location/ Qualifiers 

1. .2340 

/organism="Homo sapiens" 

/mol_type= n mRNA" 

/db_xref="taxon: 9606" 

1. .2340 

/gene="ABCG5" 

107. .2062 

/gene="ABCG5" 

/note="ATP-binding cassette, subfamily G, member 5" 

/ codon_start=l 

/product="ABCG5" 

/protein_id="AAG40003 .1" 

/db_xref="GI: 11692800" 

/ translation="MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHAS 



YSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 
LGRAGTFLGEVTWGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGN 
PGSFQKKVEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDE 
PTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGT 
PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAI 
CHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQ 
NLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSD 
QESQDGLYQKWQMMIAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLA 
PHLIGEFLTLVLLGIVQNPNIVNSW7VLLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 
FQKYCSEILVWEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 
LI LYS FI PALVI LGIWFKI RDHLISR" 



ORIGIN 



Query Match 6.5%; Score 101.6; DB 9; Length 2340; 

Best Local Similarity 67.4%; Pred. No. 1.3e-19; 

Matches 159; Conservative 0; Mismatches 74; Indels 3; Gaps 1; 

Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 400 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I II I I 
Db 2 5 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 84 

Qy 4 01 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 460 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II III 
Db 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 14 4 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 520 

III II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 57 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II 
Db 205 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 257 



RESULT 46 

AX456519 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 



FEATURES 

source 



PAT 06-JUL-2002 



AX456519 1920 bp DNA linear 

Sequence 41 from Patent WO0227016. 

AX456519 

AX456519.1 GI: 21715409 



synthetic construct 
synthetic construct 
artificial sequences. 
1 

Patel,S.B. and Dean,M. 

Gene involved in dietary sterol absorption and excretion and uses 
therefor 

Patent: WO 0227016-A 41 04-APR-2002; 

THE DEPARTMENT OF HEALTH AND HUMAN SERVICES (US) ; Patel, 
Shailendra B. (US) ; Dean, Michael (US) 

Location/Qualif iers 

1. .1920 

/organism=" synthetic construct" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 32630" 
/note="Primer" 



ORIGIN 



Query Match 5.9%; Score 93; DB 6; Length 1920; 

Best Local Similarity 84.0%; Pred. No. 6.1e-17; 

Matches 105; Conservative 0; Mismatches 20; Indels 0; Gaps 0; 

Qy 1162 AG CAAC CGTGTCGGGCCTTGGT GGAAC AT C AAAT CAT GC C AGC AGAAGT GG GAC AGG C AA 1221 

III I I I I II I I I I II I I I I MINI I II I I I I I I I I I I I I I I I I I II 
Db 106 AGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAG 165 

Qy 1222 AT C CT CAAAGAT GT C T C CT T GT AC AT C GAGAGT GGC CAGAT TAT GT G CAT CT T AG GC AG C 12 81 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I 
Db 166 AT C CT CAAAGAT GT CT C CT T GT AC GT GGAG AG C GGG CAGAT CAT GT GC AT C C TAG GAAG C 225 

Qy 1282 TCAGG 1286 

I I I I I 

Db 22 6 TCAGG 230 



RESULT 4 7 

AX320887 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 

FEATURES 

source 



AX320887 122 bp DNA linear PAT 14-DEC-2001 

Sequence 8 from Patent WO0179272. 

AX320887 

AX32 08 87 . 1 GI: 17 902 43 6 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Tian,H., Schultz,J. and Shan,B. 

Sitosterolemia susceptibility gene (ssg) : compositions and methods 
of use 

Patent: WO 0179272-A 8 25-OCT-2001; 
Tularik Inc. (US) 

Location/ Qualifiers 

1. .122 

/organism="Homo sapiens" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 9606" 
/note="exon 2 of hSSG" 



ORIGIN 



Query Match 5.7%; Score 90; DB 6; Length 122; 

Best Local Similarity 83.6%; Pred. No. 4.7e-16; 

Matches 102; Conservative 0; Mismatches 20; Indels 0; Gaps 0; 

Qy 1164 CAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAAT 1223 

I I I I I II I I I I I I I I I I I I I I I I Ml I I I I I I I I I I I I I I I I I II II 
Db 1 CCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGAT 60 

Qy 1224 CCT CAAAGAT GTCTCCTTGTACATC GAGAGT GGC CAGAT TAT GTGCATCTTAGGCAGCTC 1283 

I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 CCT CAAAGAT GTCTCCTT GT AC GT G GAG AG C G GG CAGAT CAT GT G CAT C C T AGGAAG C T C 120 



Qy 



1284 AG 1285 



Db 121 AG 122 



RESULT 4 8 

AF351785/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 
FEATURES 

source 



gene 
CDS 



AF351785 4829 bp mRNA linear ROD 26-AUG-2002 

Rattus norvegicus sterolin-2 (Abcg8) mRNA, complete cds . 

AF351785 

AF351785.2 GI: 22477145 

Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chorclata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 4829) 

Lu,K., Lee,M.H., Hazard, S., Brooks-Wilson, A. , Hidaka,H., Kojima,H., 
Ose,L., Stalenhoef , A. F. , Mietinnen, T . , Bjorkhem, I., Bruckert,E., 
Pandya,A. , Brewer, H.B. Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel,S.B. 

Two genes that map to the STSL locus cause sitosterolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 

sterolin-2, encoded by ABCG5 and ABCG8, respectively 

Am. J. Hum. Genet. 69 (2), 278-290 (2001) 

21344600 

11452359 

2 (bases 1 to 4829) 

Lu,K., Yu,H., Lee,M. and Patel,S.B. 

Molecular cloning, genomic structure, and characterization of novel 

mouse head-to-head tandem ABC transporters 

Unpublished 

3 (bases 1 to 4829) 
Lu,K., Lee,M. and Patel,S.B. 
Direct Submission 

Submitted (21-FEB-2001) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29407, USA 

4 (bases 1 to 4829) 

Lu,K., Yu,H., Lee,M. and Patel,S.B. 

Direct Submission 

Submitted ( 26-AUG-2002 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 
Sequence update by submitter 

On Aug 26, 2002 this sequence version replaced gi: 15148516. 
Location/Qualifiers 
1. .4829 

/organism= M Rattus norvegicus" 
/mol_type="mRNA" 
/strain="Sprague-Dawley" 
/db_xref="taxon: 10116" 
1. .4829 
/gene="Abcg8" 
111. .2129 
/gene="Abcg8" 
/codon start=l 



/product^" sterolin-2" 
/protein_id="AAK84831.2" 
/db_xref="GI : 22477146" 

/translation="MAEKTKEETQLWNGTVLQDASSLQDSVFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDMASQVPWFEQLAQFKLPWRSRGSQDSWDLGIRNLSFKVRSGQMLA 
IIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQPSTPQLIQKCVAHVRQQDQLLPN 
LTVRETLTFIAQMRLPKTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGVAQHMVQYFTSIGYPCPRYSNPADFYVDLTSIDRRSKEQ 
EVATMEKAI^LLAALFLEKVQGFDDFLWKAJSAKSLDTGTYAV^ 

GMIQQFTTLI RRQI SNDFRDLPTLFIHGAEACLMSLII GFLYYGHADKPLS FMDMAAL 
LFMIGALIPFNVILDWSKCHSERSLLYYELEDGLYTAGPYFFAKVLGELPEHCAYVI 
IYGMPI YWLTNLRPGPELFLLHFMLLWLVVFCCRTMALAASAMLPTFHMSSFCCNALY 
NSFYLTAGFMINLNNLWIVPAWISKMSFLRWCFSGLMQIQFNGHIYTTQIGNLTFSVP 
GDAMVTAMDLNSHPLYAIYLIVIGISCGFLSLYYLSLKFIKQKSIQDW" 

ORIGIN 

Query Match 5.4%; Score 84; DB 10; Length 4829; 

Best Local Similarity 77.9%; Pred. No. 4.1e-14; 

Matches 134; Conservative 0; Mismatches 20; Indels 18; Gaps 2; 

Qy 2 GAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCC 61 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 172 GAAGCATCCTGGAGTACAGTCCCGTTCCACAGCTGGGTCTCCTCTTTGGTCTTCTCAGCC 113 

Qy 62 ATGACC AGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTT 106 

I I I I I I I I I I I I II I I I I I I I I I I II I I I I II I I I 

Db 112 ATGACCTGCGGTGTTGTGCCCTTTGTGTGGCTCCTGAGGCCTCCCCTGCTGTTGGCTAGG 53 

Qy 107 GGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCT 155 

II II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 52 CCAGGATTCTTTCTGTCTTTGCTCCTTAGAGCTAGGGCACTTGAGTCCTCCT 1 



RESULT 4 9 

AC146282/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 



AC146282 135280 bp DNA linear HTG 02-AUG-2003 

Takifugu rubripes clone MRC-186C24, WORKING DRAFT SEQUENCE, 7 
unordered pieces. 
AC146282 

AC14 62 82. 1 GI: 33413347 
HTG; HTGS_PHASE1; HTGS_DRAFT. 
Takifugu rubripes (Fugu rubripes) 
Takifugu rubripes 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Actinopterygii ; Neopterygii; Teleostei; Euteleostei; Neoteleos tei ; 
Acanthomorpha ; Acanthopterygii ; Percomorpha; Tetraodonti formes ; 
Tetradontoidea ; Tetraodontidae; Takifugu. 

1 (bases 1 to 135280) 

Cheng, J. -F . , Hamilton, M., Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I . and Rubin, E.M. 
Direct Submission 
Unpublished 

2 (bases 1 to 135280) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I. and Rubin, E.M. 
Direct Submission 



Submitted ( 02-AUG-2003) Genome Sciences, Lawrence Berkeley National 

Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

Draft Sequence Produced by Berkeley PGA 

Web site: http://pga.lbl.gov 

Center Code: PGABERK 

Center Project Name: F069-186C24 

Bac Clone Name: MRC-186C24 

Additional information on comparative analysis and ordering are 
available at: 

http : //pga . lbl . gov/ cgi-bin/search_cvcgd?type=n&value= 

Funding agent: Programs for Genomic Applications (NHLBI) 

Summary Statistics: 

Sequencing vector: Plasmid; pUC18 

Chemistry: Dye-terminator Big Dye 

Assembly program: Phrap version 0.990329. 

* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 7 contigs . The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as ■ 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 



* 


be preserved. 








* 


1 


28849 


contig 


of 28849 bp in 


length 




28850 


28949 


gap of 


unknown length 






28950 


40654 


contig 


of 11705 bp in 


length 




40655 


40754 


gap of 


unknown length 






40755 


55789 


contig 


of 15035 bp in 


length 


* 


55790 


55889 


gap of 


unknown length 






55890 


70983 


contig 


of 15094 bp in 


length 




70984 


71083 


gap of 


unknown length 






71084 


90702 


contig 


of 19619 bp in 


length 


* 


90703 


90802 


gap of 


unknown length 






90803 


112817 


contig 


of 22015 bp in 


length 


* 


112818 


112917 


gap of 


unknown length 






112918 


135280 


contig 


of 22363 bp in 


length . 



FEATURES Location/Qualifiers 
source 1. .135280 

/organism="Takifugu rubripes" 
/mo l_type=" genomic DNA" 
/db_xref="taxon: 31033" 
/clone="MRC-186C24" 

ORIGIN 

Query Match 4.3%; Score 67.6; DB 2; Length 135280; 

Best Local Similarity 68.1%; Pred. No. 6.3e-09; 

Matches 94; Conservative 0; Mismatches 44; Indels 0; Gaps 0; 

Qy 1158 TTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAG 1217 

II II I I I I I I 1 I I I I I I I I I I I I I I I I I 111 III I 

Db 33389 TTTCAGTGAGCGTGTGGGTCCCTGGTGGGACTTACCCTCCTGCAGGAAGCGATGGACTCG 33330 

Qy 1218 GCAAAT C CT CAAAGATGT CT C CTT GTACAT C GAGAGT GGC CAGATTAT GT GC AT CTT AGG 1277 

I I I I I I I I I I II I I I II I I I II I I I I I I I I I II I I I I I I I I I I I Ml 
Db 3332 9 T C AGAT C CT CAAT GAT GT C T C T T T C C AC GT G GAGAGT G G G CAGAT TAT G G G CAT T CT GGG 3327 0 



JOURNAL 
COMMENT 



Qy 1278 CAGCTCAGGTAAGTGCCT 1295 

I I I I I I I I I I M 
Db 33269 CAACT CAG GTT T GC AC GT 33252 



RESULT 50 

AX685731/c 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



AX685731 2019 bp DNA linear PAT 29-MAR-2003 

Sequence 3 from Patent WO02081691. 

AX685731 

AX685731. 1 GI : 29371740 

Mus mus cuius (house mouse) 
Mus mus cuius 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Hobbs , H . H . , Shan,B., Barnes, R. and Tian,H. 
Abcg5 and abcg8 : compositions and methods of use 
Patent: WO 02081691-A 3 17-OCT-2002; 
Tularik 



Inc . 



(US) 



BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 



(US) 



FEATURES 

source 



CDS 



Location/Qualifiers 
1. .2019 

/organism="Mus musculus" 
/mol_type="unas signed DNA" 
/db_xref="taxon: 10090" 
1. .2019 

/note="unnamed protein product; mouse ABCG8 (mABCG8)" 

/codon_start=l 

/protein_id="CAD86571. 1" 

/db_xref="GI: 29371741" 

/ db__x r e f = " REMTREMBL : CAD 8 6571" 

/ translation="MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLA 
IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKER 
EVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELP 
GMIEQFSTLIRRQI SNDFRDLPTLLIHGSEACLMSLI I GFLYYGHGAKQLS FMDTAAL 
LFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVI 
I YAMPIYWLTNLRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFHMSSFFCNALY 
NSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSIL 
GDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 



ORIGIN 



Query Match 4.0%; Score 63; DB 6; Length 2019; 

Best Local Similarity 100.0%; Pred. No. 1.4e-07; 

Matches 63; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



QY 
Db 



1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I II I I I II 1 I I I I I 
63 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 4 



QY 



61 CAT 63 
I I I 



Db 



3 CAT 1 



Search completed: April 29, 2004, 17:06:37 
Job time : 6741.85 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: 



April 29, 2004, 14:53:09 ; Search time 756.83 Seconds 

(without alignments) 
8812.639 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 



US-09-98 9-981A- 9_COPY_34 3 6_5 005 
1570 

1 cgaagcatcctgaagtacag ctagagagcaaacccagagc 1570 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 3373863 seqs, 2124099041 residues 

Total number of hits satisfying chosen parameters: 6747726 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 50 summaries 



Database : N_Geneseq_29 Jan04 : * 

1: geneseqnl98 Os : * 

2: geneseqnl990s : * 

3 : geneseqn2000s : * 

4: geneseqn2001as : * 

5: geneseqn2001bs : * 

6: geneseqn2002s : * 

7: geneseqn2003as : * 

8: geneseqn2003bs : * 

9 : geneseqn2003cs : * 
10 : geneseqn2004s : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


1568 


99 


.9 


6043 


7 


AAD48884 


Aad48884 


ABCG5-ABC 


2 


358.6 


22 


.8 


359 


7 


AAD48885 


Aad48885 


Control D 


3 


284.4 


18 


.1 


2354 


6 


ABK51685 


Abk51685 


Mouse ABC 


4 


242.8 


15 


.5 


5460 


6 


ABK51683 


Abk51683 


Human ABC 


5 


215 


13 


.7 


2512 


9 


ADB62671 


Adb62671 


Human cDN 


6 


191.4 


12 


.2 


2258 


6 


AAD22008 


Aad22008 


Mouse sit 


7 


151.2 


9 


.6 


226 


3 


AAA10131 


Aaal0131 


Rat liver 





Q 

o 


1 R1 9 
X J X > z 


y . D 


9 9 6 

Z ^ D 




7\7\ri4 9 1 R R 


7\ ar i / 9 1 R R 
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ALIGNMENTS 



RESULT 1 
AAD48884 

ID AAD48884 standard; DNA; 6043 BP. 
XX 

AC AAD48884; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE ABCG5-ABCG8 DNA. 



XX 

KW ABC family cholesterol transporter; ABCG8; s terol-related disorder; 

KW sitosterolaemia; hyperlipidaemia ; hypercholesterolaemia ; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; ABCG5; 

KW ds. 
XX 

OS Unidentified. 



XX 

FH Key Location/Qualifiers 

FT exon complement ( 3 . .104) 

FT /*tag= a 

FT /number= 2 

FT /note= "Corresponds to ABCG8 gene" 

FT intron complement ( 105 . .3435) 

FT /*tag= b 

FT / number = 1 

FT /cons_splice= (5 f site:NO, 3 ! site:NO) 

FT /note= "Corresponds to ABCG8 gene" 

FT misc_feature complement ( 1098 . .1377) 
FT /*tag= c 

FT /note= "ABCG8 intronl conserved region" 

FT misc_feature complement ( 32 50 . .3294) 
FT /*tag= d 

FT /note= "ABCG8 intronl conserved region" 

FT exon 3436. .3498 

FT /*tag= e 

FT /number= 1 

FT /note= "Corresponds to ABCG8 gene" 

FT exon 3858. .4003 

FT /*tag= f 

FT /number^ 1 

FT /note= "Corresponds to ABCG5 gene" 

FT intron 4004. .4598 

FT /*tag= g 

FT /number= 1 

FT /note= "Corresponds to ABCG5 gene" 

FT exon 4599. .4720 

FT /*tag= h 

FT /number^ 2 

FT /note= "Corresponds to ABCG5 gene" 

FT intron 4721. .6043 

FT /*tag= i 

FT /number^ 2 

FT /partial 

FT /note= "Corresponds to ABCG5 gene" 

XX 



PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2 001WO-US 04 382 3 . 
XX 

PR 20-NOV-2000; 2 000US-0252235P . 

PR 28-NOV-2000; 2 000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 



PA (TEXA ) UNIV TEXAS SYSTEM. 
XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 
XX 

PT New ABCG8 polypeptides and nucleic acids , useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies . 

XX 

PS Disclosure; Fig 3 ; 94pp; English . 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is ABCG8- ABCG5 DNA 
XX 

SQ Sequence 6043 BP; 1378 A; 1509 C; 1497 G; 1654 T; 0 U; 5 Other; 

Query Match 99.9%; Score 1568; DB 7; Length 6043; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1570 ; Conservative 0 ; Mismatches 0 ; Indels 0 ; Gaps 0 ; 

Qy 1 CGT^AGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I I I I I I I M I I I I I I I II I II I I I I I I I I II I II I I I I I I I I I I I I I I I I 
Db 3436 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 3495 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 120 

I I I I II II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II 

Db 3496 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 3555 

Qy 121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 180 

II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I 1 I 1 I I I I I I I I II I I II I I 

Db 3556 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 3615 

Qy 181 CCTYTCTGG CAAAC ACT T CC T AT AAACAC AC C GT GT GT T CT GC CT AT T GT C G AGATAAG G 24 0 

I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I M I I I 
Db 3616 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 3675 

Qy 241 AC ACT CT G G C T AAAG GT ACAT C AGAT AAT G G CAT C GT T GGC CAAAT T GGT GAACT GTT AT 300 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 3676 AC AC T CT G G C T AAAGGT AC AT C AGAT AAT GGC AT C GT T GG C CAAAT T G GT GAACT GT TAT 3735 

Qy 301 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 360 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I M I 
Db 3736 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 3795 

Qy 361 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 42 0 

I I I I I I I I I I I II I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 37 96 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 3855 



QY 



421 C CAT G G GT GAG CT GCCCTTTCT GAGT C C AGAGG GAG C CAGAG GGC C T C ACAT CAACAGAG 4 8 0 



Db 3856 C CAT GGGT GAG CTGCCCTTTCT GAGT CC AGAG GGAG C C AGAGGGC CT C AC AT CAAC AGAG 3915 

Qy 481 GGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAG 54 0 

I I I I II 1 I II I I I I I I I I I I I I I I I I II I I I I I I I I I I i I I I i I I I I I I I I I I I I I I I I I 

Db 3916 GGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAG 3975 

Qy 541 GTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCT 600 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 

Db 3976 GTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCT 4035 

Qy 601 CTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGC 660 

I I I I I I I II I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 4036 CTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGC 4095 

Qy 661 AGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTG 720 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 4 096 AGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTG 4155 

Qy 721 CGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCAC 780 

I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 4156 CGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCAC 4215 

Qy 781 TGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGG 84 0 

I I II I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 4216 TGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGG 4275 

Qy 841 CT C G GAGAGT GGGGGTGCTGG GGG C ACAAAAT GGAAT GAACACT G CT GAAGGAAT GC AG G 900 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I II 

Db 4276 CT C GG AGAGT GGGGGTGCTGG GG G C ACAAAAT G GAAT GAACACT GCT GAAG GAAT GCAGG 4335 

Qy 901 GT T C ACT T C AAGAAGAAAG CAGT GT G C AG GT GT ACC AT CT C C CAGT CAGAGAC C C AGTAA 960 

I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I II I I I 

Db 4336 GT T C AC T T CAAGAAGAAAGC AGT GT G C AGGT GT AC CAT CT C C CAGT CAGAGAC C CAGT AA 4 395 

Qy 961 TCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCC 1020 

I I I I I I I I I 1 I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 

Db 4396 TCAGAGCAGCTAATGGGAGGCAT GCT CCTT GGGT GGTGGCCAACTTGT CAT TATACCTCC 4455 

Qy 1021 AAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACT 108 0 

I I I I I I II I I I I I I I I I I I I 1 I I II I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 4456 AAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACT 4 515 

Qy 1081 GGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTG 1140 

I I II I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4516 GGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTG 4575 

Qy 1141 CCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGC 1200 

I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I II I I I I II I 

Db 4576 CCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGC 4635 

Qy 1201 CAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCCAG 1260 

I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

Db 4636 CAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCCAG 4695 

Qy 12 61 ATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTC 132 0 

I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 



Db 4696 ATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTC 4755 

Qy 1321 TAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 1 I I I I I I I I I I I I I I 
Db 4 756 TAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTA 4 815 

Qy 1381 AGT T GT AGAGAG GC AG C CAT G CAT T T G G CAT T T GAAT AC AAT CT GGT GACT T GT C T GGCT 1440 

I I I I 1 I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 4 816 AGTT GT AGAGAG GC AG C CAT G CAT T T GG CAT T T GAAT ACAAT CT G GT G ACT T GTCTGGCT 4875 

Qy 1441 GC CAAT AGAAC C T AGT AC C AAAGT GAAAT C T T G AGGAAAAT C C C T G GAAAGAGT GGAAAG 1500 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I II II I I I I I I I I I I I I I I I I I 1 I II I I I I I I 
Db 4 876 G C CAAT AGAAC CT AGT AC CAAAGT GAAAT C T T GAGGAAAAT C C CT GGAAAGAGT GGAAAG 4 935 

Qy 1501 TCCTGCCTAACACGTAAGTGCCTTCTTTGCTTGTTTGATTGACTGTGATGCTAGAGAGCA 1560 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4936 TCCTGCCTAACACGTAAGTGCCTTCTTTGCTTGTTT GAT T GACT GT GAT GCTAGAGAGCA 4995 

Qy 1561 AACCCAGAGC 1570 

I I I I I I I I I I 
Db 4 99 6 AACCCAGAGC 5 005 



RESULT 2 
AAD48885 

ID AAD48885 standard; DNA; 359 BP. 
XX 

AC AAD48885; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Control DNA fragment flanked by ABCG5-ABCG8 DNA sequence. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hyper cholesterolaemia ; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; ABCG5; 

KW ds . 
XX 

OS Unidentified. 
XX 

PN WO200281691-A2 . 
XX 

PD 17-OCT-2002 . 
XX 

PF 20-NOV-2001; 2 001WO-US04 382 3 . 
XX 

PR 20-NOV-2000; 2 000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 
XX 



PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Disclosure; Fig 3; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitos terolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is control DNA fragment 

CC flanked by ABCG5-ABCG8 DNA sequence 
XX 

SQ Sequence 359 BP; 68 A; 103 C; 87 G; 100 T; 0 U; 1 Other; 

Query Match 22.8%; Score 358.6; DB 7; Length 359; 

Best Local Similarity 100.0%; Pred. No. 7.5e-103; 

Matches 359; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 64 GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTT 123 

I I I I i I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I 

Db 1 GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTT 60 

Qy 124 GCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT 183 

II I I I I I I I I t M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

Db 61 GCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT 12 0 

Qy 184 YTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACA 24 3 

II I I I II I I I I II I M I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 YT CT GG CAAAC ACT T C C T ATAAAC AC AC C GT GT GTT CT G C CT AT T GT C GAGAT AAGGACA 180 

Qy 2 44 CT CT GG CT AAAG GT AC AT CAGAT AAT G G CAT C GTT G GC CAAAT T G GT GAAC T GT TAT C T C 303 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 181 CT CT G G CT AAAG GT AC AT CAGAT AAT GG CAT C GT T G G C CAAATT GGT GAAC T GT TAT CT C 24 0 

Qy 304 ACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAA 363 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 241 AC GAG GAT T C C AGGG CT G G GTAGGAT C G GAC AGGGC ACT C C CAT TGGCTCCT CAGTT AAA 300 

Qy 364 GCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCC 422 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 3 01 GCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCC 359 



RESULT 3 
ABK51685 

ID ABK51685 standard; cDNA; 2354 BP. 
XX 

AC ABK51685; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Mouse ABCG5 cDNA sequence. 



XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia ; cholesterol; 

KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer's disease; 

KW ss . 
XX 

OS Mus sp. 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer 's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the cDNA sequence of the mouse ABCG5 gene of the 

CC invention 

XX 

SQ Sequence 2354 BP; 573 A; 604 C; 594 G; 583 T; 0 U; 0 Other; 



Query Match 18.1%; Score 284.4; DB 6; Length 2354; 

Best Local Similarity 98.0%; Pred. No. 8.1e-79; 

Matches 288; Conservative 0; Mismatches 6; Indels 0; Gaps 0 



Qy 285 AT T GGT GAAC T GT T AT CT C AC GAG GAT T C C AGGG CT GGGT AG GAT C G GACAG GG CAC T C C 344 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I M I I I I I I I 
Db 1 ATT GGT GAACTGTTATCTCACGAGGATTCCAGGGCT GGGT AGGATCGGACAGGGCACTCC 60 

Qy 345 CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGTW^ATTCACTTGC 4 04 

I II I I I I I I I I I I I I I II I II I I I I M I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II 
Db 61 CATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGC 12 0 

Qy 4 05 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 4 64 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I I I II I I I I I I I I I I I I I I I I I I I I II I 
Db 121 ATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGG 180 

Qy 4 65 CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAG 524 

I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I t I I II I I I I I I I I I I I t I I II I 
Db 181 CCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAG 24 0 

Qy 525 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 57 8 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 GCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 294 



RESULT 4 
ABK51683 

ID ABK51683 standard; DNA; 5460 BP. 
XX 

AC ABK51683; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 upstream genomic sequence, exon 1, intron 1 and exon 2. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitos terolemia; cholesterol; 
KW arteriosclerosis; heart disease; hypers terolemia ; Alzheimer's disease; 
KW chromosome 2p21; ds . 
XX 

OS Homo sapiens . 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P. 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 
PA (PATE/) PATEL S B. 
PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 
PT acid encoding the polypeptide, useful for treating sitosterolemia, 
PT arteriosclerosis and heart diseases. 
XX 



PS Example 3; Page 38-41; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the upstream genomic sequence, exon 1, intron 1 

CC and exon 2 of the human ABCG5 gene located on chromosome 2p21 

XX 

SQ Sequence 5460 BP; 1351 A; 1350 C; 1508 G; 1243 T; 0 U; 8 Other; 

Query Match 15.5%; Score 242.8; DB 6; Length 5460; 

Best Local Similarity 54.9%; Pred. No. 2.1e-65; 

Matches 649; Conservative 1; Mismatches 488; Indels 44; Gaps 7; 



Qy 


120 


CTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCAT 

II 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 II II! 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 
CTCTGTTTCTTGGAGCAGGGACACCTCGGCCTCCTGCCCTGGGCCCGTCTCTCCCAGCAT 


179 


Db 


4310 


4369 


Qy 


180 


TCCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAG 

1 1 II : 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 I 1 
TCCTTGCTGGCAAGCCCACC T AC AAAC GT GTGTGTTCTTGCC C AC T GT C AAGAT AAG 


239 


Db 


4370 


4426 


Qy 


240 


GACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTA 

I I 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 
GACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCAGTCCTGCTG 


299 


Db 


4427 


4486 


Qy 


300 


TCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGT 

1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
TCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCCAAC 


359 


Db 


4487 


4546 


Qy 


360 


TAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTA 

I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
TGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTGTTG 


419 


Db 


4547 


4606 


Qy 


420 


GCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGA 

II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 III II M 1 III II 1 1 1 1 1 1 1 
GCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAACAGA 


479 


Db 


4607 


4666 


Qy 


480 


GGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTA 

1 1 II 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 
GGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGCCTG 


539 


Db 


4667 


4723 



Qy 54 0 GGT GT C CT G CAT GTGTCCT AC AGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCT AG GC 599 

II I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I II I II 

Db 4724 GGCAT C CT C CAT GC CT C CT ACAGCGT C AGGT AAGG CAGAGCCCTTGC 4770 

Qy 600 TCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAG 659 

I II I I I I I I I I I I I I I I I I 

Db 4771 TGCTGCTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCAC 4 8 30 

Qy 660 C AGAT CAG GGT GAAAGT GGAC AGT C T GT AACAAC AGT GAGT C GT TCCTCCTCCTCCTCCT 719 

I I I I I I I I I I I I II I I I 

Db 4 831 TCTTTAAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGG 4 890 

Qy 72 0 GCGCAGGGCAGAGCCTGGACATT7W\ACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCA 779 

I I I I I I I I I I I I I I III I I I I I I 

Db 4 8 91 AGAGGGAGAAGGGCTGTTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGC 4950 

Qy 780 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCC CACCAC CT GT C CT GT GT AGAT 832 

I III II I II II II I I I I I I I 

Db 4951 CCAGGGAAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTAC 5010 

Qy 833 GGAGAAGGCTC GGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGC 8 86 

II III I II I I I I I I I I I I I III I I I I I III 

Db 5011 GGCCTGCCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGG 5070 

Qy 887 TGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTGCAGGTGTACCATCTCCCAGT 94 6 

I I I I i I I I I I I II I II I I I I I II II II 

Db 5071 T GAAAGAAT GC AGGG ACAG C CAC CT C GC AG C CAAAC GGAC AG GAC AT T C AGAG C 5124 

Qy 947 CAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTT 1006 

I II I I I I I II I I I I I I I I I I I I I I 

Db 5125 AACTCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTT 5184 

Qy 1007 GT CATTAT AC CT C C AAGGACAACAGAGT GGT ACAT AAGGCT AAAACAGAGTT GT CAACCT 10 66 

I I I I I I I I I I I I I II 

Db 5185 CT AC AGAG GAGG G C GC AGAGACT GAAAC AC GT T AG GAGC CT GTCCGGAGACTAC 523 8 

Qy 1067 GTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACC 112 6 

ill II II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 5239 TGGGGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCC 52 9 8 

Qy 1127 CTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGA 118 6 

II I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I 

Db 5299 CCAAGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGG 5358 

Qy 1187 ACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACA 1246 

I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I 
Db 5359 ACAT C ACAT CTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACG 5418 

Qy 12 47 T C GAGAGT G G C CAGAT TAT GT G CAT CTT AGGCAGC T CAG GT A 12 88 

I I I I I I II I I I I I I I I I I I I I I MM II I I I I I I I I 
Db 5419 T G GAGAGC G G G CAGAT CAT GT G CAT C CT AGGAAGC T CAG GT A 5460 



RESULT 5 
ADB62671 

ID ADB62671 standard; cDNA; 2512 BP. 



XX 

AC ADB62671; 
XX 

DT 04-DEC-2003 (first entry) 
XX 

DE Human cDNA encoding clone LIVER20030650 . 
XX 

KW Human; ss; gene; pharmaceutical; diagnostic; gene therapy; 

KW tissue regeneration; cell regeneration; membrane protein; 

KW signal transduction-related protein; transcription-related protein; 

KW osteoporosis; neurological disease; cancer; tumour. 

XX 

OS Homo sapiens . 
XX 

FH Key Location/Qualif iers 

FT CDS 1469. .2239 

FT /*tag= a 

FT /product^ "Clone LIVER20030650 protein" 
XX 

PN EP1308459-A2 . 
XX 

PD 07-MAY-2003. 
XX 

PF 28-MAR-2002; 2002EP-000074 01 . 
XX 

PR 05-NOV-2001; 2001 JP-00379298 . 

PR 25-JAN-2002; 2 0 02US-0 0350 97 8 . 
XX 

PA (HELI-) HELIX RES INST. 

PA (REAS-) RES ASSOC BIOTECHNOLOGY. 

XX 

PI Isogai T, Sugiyama T, Otsuki T, Wakamatsu A, Sato H, Ishii S; 

PI Yamamoto J, Isono Y, Hio Y, Otsuka K, Nagai K, Irie R, Tamechika I; 

PI Seki N, Yoshikawa T, Otsuka M, Nagahari K, Masuho Y; 

XX 

DR WPI; 2003-450961/43. 

DR P-PSDB; ADB64641. 
XX 

PT New polynucleotides and polypeptides, useful for developing a diagnostic 

PT marker or medicines for regulation of their expression and activity, or 

PT as targets of gene therapy. 
XX 

PS Claim 1; Page; 222pp; English. 
XX 

CC The invention discloses a polynucleotide comprising a sequence selected 

CC from 1970 fully defined nucleotide sequences which encode novel 

CC polypeptides. Also claimed is a polypeptide encoded by the polynucleotide 

CC or its partial peptide, an antibody binding to the polypeptide or peptide 

CC of the polynucleotide, immunologically assaying the polypeptide or 

CC peptide of the polynucleotide by contacting the polypeptide or peptide 

CC with the antibody of the encoded protein, and observing the binding 

CC between the two, a trans formant carrying the polynucleotide in an 

CC expressible manner and an antisense polynucleotide. The oligonucleotide 

CC is useful as a primer for synthesising the polynucleotide, or as a probe 

CC for detecting the polynucleotide. The polynucleotides and encoded 

CC proteins are useful as pharmaceutical agents and many disease-related 

CC genes may be included in them, for developing a diagnostic marker or 



CC medicines for regulation of their expression and activity, or as targets 

CC of gene therapy. The genes are involved in tissue and/or cell 

CC regeneration. Membrane proteins, signal transduction- related proteins, 

CC transcription-related proteins, disease-related proteins and genes 

CC encoding them can be used as indicators for diseases (e.g. osteoporosis, 

CC neurological diseases, cancer, tumours. The cDNA may be used to regulate 

CC the activity or expression of the encoded protein to treat diseases. The 

CC sequence presented is a cDNA of the invention. Note: Some of the sequence 

CC data for this patent is not represented in the printed specification, but 

CC is based on sequence information supplied by the European Patent Office. 

XX 

SQ Sequence 2512 BP; 543 A; 675 C; 701 G; 593 T; 0 U; 0 Other; 

Query Match 13.7%; Score 215; DB 9; Length 2512; 

Best Local Similarity 54.5%; Pred. No. 9.2e-57; 

Matches 576; Conservative 0; Mismatches 450; Indels 31; Gaps 6 

Qy 237 AAGGACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTG 296 

I I I I I I I I I I I 1 I I I I I I I I I I I I I I II I I ! I I II I I I I I I I I III 
Db 1 AAGGACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCATTCCTG 60 

Qy 297 TTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTC 356 

I II I I I I II I I I I I I II I I I I I I I I I I I I I 

Db 61 CTGTCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCC 12 0 

Qy 357 AGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTG 416 

I I I II I I I M I I I I I I I I I I I I I II I I I I I II I I I I I I II 

Db 121 AACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTG 180 

Qy 417 CTAG C CAT GG GT GAG CT GCCCTTTCT GAGT C C AGAGGGAG C C AGAG G G C C T C ACAT CAAC 47 6 

I I I I I I I I I I I I II I I I I I I III II III III II I II I 
Db 181 TTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAAC 24 0 

Qy 477 AGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGC 536 

Mill II I I I I I I I I I I I I I I I I I I III I I I III II I I I I I I 
Db 241 AGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGC 297 

Qy 537 TTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTA 596 

III I I II I I I I I II I I II I I I I I II I I I I I I II III 
Db 2 98 CTGGGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGGCAGAGCCC TTGCTG 34 9 

Qy 597 GGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTC 656 

II II II I I I I I I I I I I I 

Db 350 CTGCTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCACTC 4 09 

Qy 657 CAGCAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCT 716 

I I I I I I I I I I I I I I II 

Db 410 TTTAAGTGCCAGTCTGGGCACTTCGGGCTCCCTCTTTAGTGGATCGGGTGGAGAGAGGAG 4 69 

Qy 717 CCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTC 77 6 

I I I I I I I I I I I I I I I I II II III I I I I 

Db 470 AGGGAGAAGGGCTGTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGCCCA 529 

Qy 777 TCACTGATTTCTGCTCTCCCCTTCCTTGACTC-GCCCACCACCTGTCCTGTGTAGATGGA 835 

I I I I II I I I I I I I I I I I I I I I 

Db 530 GGGAAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTACGGC 589 



Qy 836 GAAGGCTC GGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGA 889 

III I I I I 1 I I I I I I I I I III I I I I I I I I I I I 

Db 590 CTGCCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGGTGA 64 9 

Qy 8 90 AGGAAT GCAGGGTT C ACT T CAAGAAGAAAGCAGT GT GC AGGT GT AC CAT CT C CCAGT CAG 94 9 

I I I I I II I I I I II I I I I I M I I I I I 

Db 650 AAGAAT G C AG GG AC AGC C AC C T C GC AGC C AAAC GGACAGGAC AT T C AGAGC AAC 703 

Qy 950 AGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTC 1009 

II I I I I I I I I I I I I I I I I I I I I I I 

Db 7 04 TCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTTCT- 762 

Qy 1010 AT TAT AC CT C CAAGGACAAC AGAGT G GT AC ATAAGG C T AAAACAGAGT T GT CAAC CT GT C 1069 

I I I I I I I I I I I II I II I 

Db 763 AC AGAG GAG GGC GC AGAGAC T GAAAC AC GT TAG GAG CCT GT C C G GAGACT ACT G 816 

Qy 1070 CAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTA 1129 

III II II I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 817 GGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCCCCA 87 6 

Qy 1130 CTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACA 1189 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M 111 
Db 87 7 AGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGGACA 93 6 

Qy 1190 T CAAAT CAT G C C AGC AGAAGT GG GAC AG G CAAAT CCT CAAAGAT GT CT C CT T GT ACAT C G 1249 

III III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
Db 937 TCACATCTTGCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGG 996 

Qy 1250 AGAGT GGC CAGAT TAT GT G CAT CT T AG G CAGCT CAG G 1286 

I I I I II I I I I I I I I I I I I II I I I I I I I I I II I 
Db 997 AGAG C GGGC AGAT CAT GT G CAT C C TAG GAAG CT CAG G 1033 



RESULT 6 
AAD22008 

ID AAD22008 standard; DNA; 2258 BP. 
XX 

AC AAD22 008; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG). 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia ; hypercholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 17; ds . 

XX 

OS Mus sp. 
XX 

FH Key Location/Qualifiers 
FT CDS 47. .2005 

FT /*tag= a 

FT /product^ "Mouse SSG protein" 

XX 

PN WO200179272-A2 . 
XX 



PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2 00 1WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2 00 0US-02 042 34 P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR P-PSDB; AAE13289. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 8; Fig 7; 105pp; English. 
XX 

CC The invention relates to an isolated Sitos terolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG DNA. Mouse SSG is located on chromosome 17 

XX 

SQ Sequence 2258 BP; 549 A; 579 C; 567 G; 563 T; 0 U; 0 Other; 



Query Match 12.2%; Score 191.4; DB 6; Length 2258; 

Best Local Similarity 97.0%; Pred. No. 2.7e-49; 

Matches 195; Conservative 0; Mismatches 6; Indels 0; Gaps 0 

378 GGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCC 437 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I M M I I I 
> 2 GGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCC 61 



438 T T T CT GAGT C C AGAG G GAGC C AGAG GGC CT CAC AT C AACAGAG G GT C T C T GAGCT C C CT G 497 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II II I I I I I I I M I I 

62 TTTCT GAGT CCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCT GAGCT CCCTG 121 

498 GAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCC 557 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I II I I I I I I I I I M I I I I 

122 GAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCC 181 



Qy 



558 TACAGCGTCAGGTAAGGGGAC 57 8 
I I I I I I I I I I I I III 



Db 



182 TACAGCGTCAGCAACCGTGTC 202 



RESULT 7 
AAA10131 

ID AAA10131 standard; cDNA; 226 BP. 
XX 

AC AAA10131; 
XX 

DT 03-JUL-2000 (first entry) 
XX 

DE Rat liver toxicological response marker, SEQ ID NO: 24. 
XX 

KW Toxicological response marker; rat; liver; expression pattern; 

KW toxicity screening; toxic compound; polycyclic aromatic hydrocarbon; PAH; 

KW benzo (a) pyrene; clofibrate; acetaminophen; ss. 

XX 

OS Rattus norvegicus. 
XX 

PN WO200012760-A2. 
XX 

PD 09-MAR-2 000 . 
XX 

PF 27-AUG-1999; 99WO-US0197 68 . 
XX 

PR 28-AUG-1998; 98US-00141825 . 

PR 13-OCT-1998; 98US-00172 108 . 

PR 13-OCT-1998; 98US-00172711 . 
XX 

PA (INCY-) INCYTE PHARM INC. 
XX 

PI Cunningham MJ, Zweiger GB, Panzer SR, Seilhamer JJ; 
XX 

DR WPI; 2000-237888/20. 
XX 

PT Isolated and purified nucleic acid molecules used as toxicological 

PT response markers for detecting and diagnosing a potential toxicological 

PT response in a mammalian subject to a test compound or molecule. 

XX 

PS Claim 6; Page 44; 76pp; English. 
XX 

CC Sequences AAA10108-A10224 represent rat liver toxicological response 

CC markers. These were identified by their pattern of at least twofold 

CC upregulation or downregulation of expression in rat liver treated with a 

CC toxic compound (e.g., clofibrate, acetaminophen or polycyclic aromatic 

CC hydrocarbons (PAHs) such as benzo (a) pyrene) . Fluorescently labelled rat 

CC liver mRNA was contacted with a microarray comprising a library of rat 

CC cDNA molecules. Twofold or larger changes in hybridisation were only 

CC observed between the sample mRNA and sequences AAA10108-A10224 . In 

CC particular, sequences AAA10110, AAA10116, AAA10117, AAA10120, AAA10126, 

CC AAA10133, AAA10138, AAA10140, AAA10142-A10144, AAA10146, AAA10149, 

CC AAA10164, AAA10174, AAA10185, AAA10188, AAA10189, AAA10201 and AAA10205 

CC were all upregulated in samples treated with known toxic compounds 

CC relative to untreated samples, while sequences AAA10150, AAA10156, 

CC AAA10157, AAA10159-A10163, AAA10166, AAA10168, AAA10170, AAA10175, 

CC AAA10178, AAA10181, AAA10192, AAA10194, AAA10197, AAA10202, AAA10209, 

CC AAA10210, AAA10212 and AAA10222 were all downregulated. Expression of 



1 



CC these sequences is therefore modulated in liver during a metabolic 

CC response to a toxic compound. The markers may be used as probes to 

CC determine the toxicity of a test compound. A tissue sample from an animal 

CC treated with the test compound is obtained, labelled (e.g., with a 

CC fluorophore) and then contacted with a microarray comprising the markers. 

CC The expression pattern of the markers may then be compared with the 

CC marker expression pattern in untreated control samples, and the toxicity 

CC of the test compound determined. The tissue sample is preferably selected 

CC from liver, kidney, brain, spleen, pancreas and lung. The nucleic acid 

CC molecules and methods of the invention may also be used for screening 

CC libraries of molecules for specific binding affinity, and for the fine- 

CC tuning of treatment regimens which use drugs with toxic side-effects such 

CC that the side-effects are minimised without compromising the efficacy of 

CC the drug 

XX 

SQ Sequence 226 BP; 51 A; 62 C; 67 G; 46 T; 0 U; 0 Other; 

Query Match 9.6%; Score 151.2; DB 3; Length 22 6; 

Best Local Similarity 87.8%; Pred. No. 4.1e-37; 

Matches 165; Conservative 0; Mismatches 23; Indels 0; Gaps 0; 



Qy 


391 


GAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAG 


450 






II 1 1 1 II 1 1 1 1 1 1 II M 1 1 1 III II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 


GAG GAT T C AC T C AC AT TTGCTTCCCGCTGGC CAT GAGT GAGCT GC C CT T T CT GAGT C C AG 


60 


Qy 


451 


AGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGG 


510 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 




Db 


61 


AGG GAG C C AGAGGGC CT C ACAAC AAC AGAG G GT CT CAGAG CT CC CT GGAG GAAG G CT C AG 


120 


Qy 


511 


TCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGT 


570 






1 II III 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


121 


TTACAGGCTCAGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGCA 


180 


Qy 


571 


AAGGGGAC 578 




Db 


181 


1 1 1 1 

ACCGTGTC 18 8 





RESULT 8 
AAD42155 

ID AAD42155 standard; DNA; 226 BP. 
XX 

AC AAD42155; 
XX 

DT 04-NOV-2002 (first entry) 
XX 

DE Rat target DNA #24. 
XX 

KW Rat; microarray; gene expression; toxicological effect; therapy; ds . 
XX 

OS Rattus norvegicus. 
XX 

PN US6403778-B1. 
XX 

PD ll-JUN-2002. 
XX 

PF 28-AUG-1998; 98US-0014 1825 . 



XX 

PR 04-MAY-1998; 98US-008402 9P . 
XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Cunningham MJ, Zweiger GB, Panzer SR, Seilhamer JJ; 
XX 

DR WPI; 2002-536048/57. 
XX 

PT Composition useful as hybridizable array elements in a microarray, for 

PT screening compounds for toxicological responses, has many polynucleotide 

PT targets derived from rat liver cDNA libraries and rat kidney libraries . 
XX 

PS Claim 1; Col 29-30; 23pp; English. 
XX 

CC The invention relates to a composition comprising a plurality of 

CC polynucleotide targets . The polynucleotide targets are derived from rat 

CC liver cDNA libraries and rat kidney libraries. The composition can be 

CC immobilised on a substrate and used as hybridisable array elements in a 

CC microarray format. The microarray is used to characterise gene expression 

CC patterns associated with novel compounds to elucidate any toxicological 

CC effects or to monitor the effects of therapeutic treatments, where 

CC toxicological effects may be expected. The composition is also useful for 

CC screening compounds and/or therapeutic treatments for potential 

CC toxicological effects and for screening a sample's toxicological response 

CC to a particular test compound. The present sequence is rat target DNA 

XX 

SQ Sequence 226 BP; 51 A; 62 C; 67 G; 46 T; 0 U; 0 Other; 

Query Match 9.6%; Score 151.2; DB 6; Length 226; 

Best Local Similarity 87.8%; Pred. No. 4.1e-37; 

Matches 165; Conservative 0; Mismatches 23; Indels 0; Gaps 0; 

Qy 391 GAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAG 450 

II I j I I I I I I I I I I I I I I I I III I I I I I I II I I I I I II I I I I I II I I I II I I I 
Db 1 GAGGAT T CAC T C ACAT TTGCTTCCCGCTGGC CAT G AGT GAGCT GC C CT T T C T GAGTC CAG 60 

Qy 451 AG GGAGC CAGAG GGC C T CAC AT CAAC AGAGG GT C T CT GAGCT C C C T G GAG CAAG GTT C GG 510 

I I II I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 61 AG GGAG C CAGAG G GC CT CACAACAAC AGAGG GT C T CAGAG C T C C C T GGAGGAAGGCT CAG 12 0 

Qy 511 TCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGT 57 0 

I II III I I I II II I I I I I II I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 TTACAGGCTCAGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGCA 18 0 

Qy 571 AAGGGGAC 57 8 

I I I I 
Db 181 ACCGTGTC 18 8 



RESULT 9 
AAA87 503 

ID AAA87503 standard; DNA; 235 BP. 
XX 

AC AAA87503; 
XX 

DT 08-JAN-2001 (first entry) 



XX 

DE Rat hepatocyte carcinogenesis biomarker nucleic acid SEQ ID NO: 427 . 
XX 

KW Rat; phenobarbitol ; carcinogenesis marker; carcinogenesis; detection; 

KW identification; carcinogenic; probe; primer; ds . 

XX 

OS Rattus norvegicus . 
XX 

PN WO200044902-A2. 
XX 

PD 03-AUG-2000. 
XX 

PF 28-JAN-2000; 2 000WO-US000503 . 
XX 

PR 29-JAN-1999; 99US-0118078P . 
XX 

PA (SEAR ) SEARLE & CO G D . 
XX 

PI Bunch RT, Curtis SW, Rodi CP, Morris DL; 
XX 

DR WPI; 2000-505977/45. 
XX 

PT New nucleic acid encoding a carcinogenic biomarker, induced by 

PT phenobarbitol treatment of rat hepatocytes, useful for identifying 

PT carcinogenic compounds. 

XX 

PS Claim 1; Page 198; 240pp; English. 
XX 

CC AAA87080 to AAA87656 represent nucleic acid sequences (Nl) encoding a 

CC carcinogenesis biomarkers . The carcinogenesis biomarkers are induced by 

CC treating rat hepatocytes with phenobarbitol. The nucleic acids are useful 

CC for identifying carcinogenic compounds. The nucleic acid molecules can be 

CC used to derive probes and/or primers for detecting or inducing 

CC carcinogenesis, respectively 
XX 

SQ Sequence 235 BP; 56 A; 62 C; 71 G; 46 T; 0 U; 0 Other; 

Query Match 9.6%; Score 150.8; DB 3; Length 235; 
Best Local Similarity 90.4%; Pred. No. 5.6e-37; 

Matches 161; Conservative 0; Mismatches 17; Indels 0; Gaps 0 



391 GAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAG 4 50 




Db 



1 GAGGAT T C ACT C AC AT TTGCTTCCCGCTGGC CAT GAGT GAG CTGCCCTTTCT GAGT C C AG 60 



QY 



451 AGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGG 510 





Db 



61 AGGGAGCCAGAGGGCCTCACAACAACAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAG 120 



QY 



511 T CAC GG G C AC AGAGG C T C G G CACAG CTT AG GT GT C C T G CAT GT GT C CT AC AG C GT CAG 568 




Db 



121 TTACAGGCTCAGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAG 178 



RESULT 10 
ABK51684 

ID ABK51684 standard; DNA; 1915 BP. 



XX 

AC ABK51684; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE DNA encoding mouse ABCG5 protein. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia ; Alzheimer 1 s disease; 

KW ds . 
XX 

OS Mus sp . 



XX 

FH Key Location/Qualif iers 

FT CDS 1. .1915 

FT /*tag= a 

FT /partial 

FT /product^ "Mouse ABCG5 protein" 

FT /transl_except= (pos : 1912. .1915, aa: LGIVI FKVRDYLISR) 

FT /note= "This sequence lacks a stop codon" 

XX 



PN WO200227016-A2 . 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PAT EL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU96985. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 42-43; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 



CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the mouse ABCG5 protein of the invention 

XX 

SQ Sequence 1915 BP; 453 A; 502 C; 484 G; 476 T; 0 U; 0 Other; 

Query Match 9.3%; Score 146.4; DB 6; Length 1915; 

Best Local Similarity 96.2%; Pred. No. 4.7e-35; 

Matches 150; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

Qy 423 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 4 82 

I I I I I I I I I I I I I I I I I I I I II I I I I I I 1 I I I I I I I I I I I I I I I I I I I 1 I I I I ! I > I I I I 
Db 1 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 60 

Qy 4 83 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 542 

I I I I I I M 1 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 1 I I I I I I I 
Db 61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 120 

Qy 543 GT C C T G CAT GT GT C CT AC AG C GT C AG GT AAGG G GAC 578 

I I I II I I I I I I I I I I I I II I I I I I I I I I I I 
Db 121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 156 



RESULT 11 
AAD4 8880 

ID AAD48880 standard; DNA; 1959 BP. 
XX 

AC AAD48 880; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG5 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 
KW sitosterolaemia; hyperlipidaemia ; hypercholesterolemia ; gall stone; 
KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 
KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 
KW ABCG5; gene; ds . 
XX 

OS Mus sp . 
XX 

FH Key Location/Qualif iers 

FT CDS 1. .1591 

FT /*tag= a 

FT /product^ M mABCG5 protein" 

XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2 001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 



PR 28-NOV-2000; 2 00OUS-O253645P . 
XX 

PA (TULA- ) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31702. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 11; Page 73; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing s terol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG5 DNA 
XX 

SQ Sequence 1959 BP; 468 A; 506 C; 495 G; 490 T; 0 U; 0 Other; 

Query Match 9.3%; Score 146.4; DB 7; Length 1959; 

Best Local Similarity 96.2%; Pred. No. 4.8e-35; 

Matches 150; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

Qy 423 AT GG GT GAGCT GCCCTTTCT GAGT C C AGAG GGAGC CAGAGGG C C T CAC AT CAAC AGAG GG 482 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 1 I I M I I I 
Db 1 AT GGGT GAG CT GCCCTTTCT GAGT C C AGAGGGAGC C AGAGG G C CT CAC AT CAAC AGAG G G 60 

Qy 483 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 542 

I I 1 I I I 1 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I II I M I I M I I I I I I 
Db 61 TCTCT GAGCT CCCTGGAGCAAGGTTCGGT CAC GGGCACAGAGGCTCGGCACAGCTTAGGT 12 0 

Qy 543 GTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 

I II I I I I I I I I I II I I I I I I I I I I II I I I I 
Db 121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 156 



RESULT 12 
ABK51686 

ID ABK51686 standard; cDNA; 2035 BP. 
XX 

AC ABK51686; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE cDNA encoding rat ABCG5 protein. 
XX 

KW Rat; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; ss; 



KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease. 
XX 

OS Rattus sp. 
XX 

FH Key Location/Qualifiers 

FT CDS 8. .1965 

FT /*tag= a 

FT /product= "Rat ABCG5 protein" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2 001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU96986. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitos terolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45-46; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the rat ABCG5 protein of the invention. (Updated on 

CC 07-AUG-2003 to correct OS field.) 

XX 

SQ Sequence 2035 BP; 481 A; 533 C; 537 G; 484 T; 0 U; 0 Other; 



Query Match 



8.6%; Score 135.8; DB 6; Length 2035; 



Best Local Similarity 89.6%; Pred. No. l.le-31; 

Matches 14 6; Conservative 0; Mismatches 17; Indels 0; Gaps 0; 



Qy 416 GCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAA 4 75 

III I I I I II I I II II I I I I I I II I I I M I I I I I I I I II I I M II I I I I I I I I I II III 
Db 1 GCT G G CC AT G G GT GAG CTGCCCTTTCT GAGT C C AGAGGGAG C C AGAGGG C CT C ACAACAA 60 

Qy 47 6 CAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAG 535 

I I I I I M I I I I I I I I I I I I I I I I I I I I I II II II III I I I I II I I II I I I I I I 
Db 61 C AGAG GGT CT C AGAG C T C C CT GG AG GAAGGC T C AGT T AC AG GCT CAGAGGC T C G G CAC AG 12 0 

Qy 536 CTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGAC 578 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 CTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAGCAACCGTGTC 163 



RESULT 13 
ABK51682 

ID ABK51682 standard; cDNA; 2516 BP. 
XX 

AC ABK51682; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 cDNA sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer ! s disease; 

KW chromosome 2p21; ss. 
XX 

OS Homo sapiens . 
XX 

PN WO200227016-A2 . 
XX 

PD 04-APR-2002 . 
XX 

PF 25-SEP-2001; 2 0 0 1WO-US02 985 9 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PAT EL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 37-38; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 



CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the cDNA sequence of human ABCG5 gene located on 

CC chromosome 2p21 

XX 

SQ Sequence 2516 BP; 601 A; 631 C; 636 G; 648 T; 0 U; 0 Other; 

Query Match 6.8%; Score 107; DB 6; Length 2516; 

Best Local Similarity 63.6%; Pred. No. 1.8e-22; 

Matches 18 0; Conservative 0; Mismatches 100; Indels 3; Gaps 1 

Qv 2 94 CTGTTATCTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTC 353 

| | | I I I I I I I I I II I I I I I I I IM M 

Db 12 CTGCTGTCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTC 71 



Qv 354 CTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTC 413 

Ml MMI I I I I I I I 1 I I I I I I I I I I I 

Db 72 CCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGC 



131 



Qy 414 CTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATC 473 

Ml | I I I I I I I I I I I IM I I I I 

Db 132 CTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTA 191 

Qy 474 AACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCAC 533 

I I I II I I I I I I M I I I I I I I I Ml 

Db 192 AACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CAC 248 

Qy 534 AGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 57 6 

| | | I I I I I I I II M I I I I I I I I I I I I I I I M 
Db 249 AGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 291 



RESULT 14 
AAD22010 

ID AAD22010 standard; DNA; 249 BP. 
XX 

AC AAD22010; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitosterolaemia susceptibility gene (SSG) exon 1. 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 



KW sterol-related disorder; hyperlipidaemia ; hypercholesterolemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; ds . 
XX 

OS Homo sapiens . 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2 001WO-US01275 8 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2 OOOUS-02 042 34 P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 73; Fig 14B; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitos terolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is an exon of human SSG DNA 

XX 

SQ Sequence 249 BP; 44 A; 86 C; 74 G; 45 T; 0 U; 0 Other; 

Query Match 6.5%; Score 101.6; DB 6; Length 249; 
Best Local Similarity 68.4%; Pred. No. 2.4e-21; 

Matches 156; Conservative 0; Mismatches 69; Indels 3; Gaps 1 

Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 400 

Mill I I I I II 1 I I I I I I I I I I I I I I I I I I I I I II I M I I I I 

Db 25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 8 4 

Qy 4 01 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 4 60 

I I I I M I I I I I I I I I I I II I I I II I I I I I I III II IN 

Db 8 5 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 



461 



AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 520 



Db 



145 




Qy 



521 



AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 




Db 



205 



GGAGCCT C ACAGC CT GGGCAT CCT C CAT GC CT C CTACAGC GT C AG 249 



RESULT 15 
AAD22009 

ID AAD22009 standard; DNA; 2340 BP. 
XX 

AC AAD22009; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitos terolaemia susceptibility gene (SSG) . 
XX 

KW Human; sitos terolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolemia ; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 2p21; ds . 

XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 
FT CDS 107. .2062 

FT /*tag= a 

FT /product^ "Human SSG protein" 

XX 

PN WO200179272-A2 . 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-01984 65P . 
PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA- ) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 
DR P-PSDB; AAE13290. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 
PT useful for screening a compound that increases the level of expression or 
PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 8; Fig 8; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 



CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitos terolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is human SSG DNA. Human SSG is located on chromosome 

CC 2p21 

XX 

SQ Sequence 2340 BP; 541 A; 601 C; 598 G; 600 T; 0 U; 0 Other; 

Query Match 6.5%; Score 101.6; DB 6; Length 2340; 

Best Local Similarity 67.4%; Pred. No. 8.9e-21; 

Matches 159; Conservative 0; Mismatches 74; Indels 3; Gaps 1 

Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 400 

I I 1 I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I 

Db 25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 84 

Qy 401 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 460 

I I I I I I I I I I I I I I I I I I I I I I I I 1 1 I I I I III M IN 
Db 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 

Ml II I I I I I I 1 I I I I I I II I I I I I I I I I 1 I I I I I I 1 I I I 
Db 14 5 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 2 04 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 576 

I I I I I I I I I I I I I I I I I I M I I I I I I I 1 I I I I I I I I I II 
Db 205 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 257 



RESULT 16 
AAD48882 

ID AAD48882 standard; DNA; 2340 BP. 
XX 

AC AAD48882; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Human ABCG5 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolemia ; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5; gene; ds . 
XX 

OS Homo sapiens . 
XX 

FH Key Location/Qualifiers 



FT 
FT 
FT 



CDS 



107. .2062 
/*tag= a 
/product= 



"hABCGS protein 



ti 



XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31704. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 11; Page 77; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG5 DNA 
XX 

SQ Sequence 2340 BP; 541 A; 601 C; 598 G; 600 T; 0 U; 0 Other; 

Query Match 6.5%; Score 101.6; DB 7; Length 2340; 
Best Local Similarity 67.4%; Pred. No. 8.9e-21; 

Matches 159; Conservative 0; Mismatches 74; Indels 3; Gaps 1 



Qy 



341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 4 00 



Db 



25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 8 4 



Qy 



Db 



4 01 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 4 60 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I Ml II Ml 
85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 14 4 



Qy 



461 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 




Db 



145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 



Qy 



521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 57 6 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II 



Db 205 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 257 



RESULT 17 
ABN90022/C 

ID ABN90022 standard; cDNA; 2564 BP. 
XX 

AC ABN90022; 
XX 

DT 16-AUG-2002 (first entry) 
XX 

DE Mouse clone IMX3_67 extended sequence. 
XX 

KW Mouse; antiinflammatory; gene therapy; ileitis; DST; ss; TOGA; 

KW digital sequence tag; total gene expression analysis. 

XX 

OS Mus musculus. 
XX 

PN WO200231114-A2. 
XX 

PD 18-APR-2002 . 
XX 

PF ll-OCT-2001; 2001WO-US032091 . 
XX 

PR ll-OCT-2000; 2 000US-023 94 83P . 
XX 

PA (DIGI-) DIGITAL GENE TECHNOLOGIES INC. 
XX 

PI Viney JL, Sims JE, Dubose RF, Baum PR, Hasel KW, Hilbush BS ; 
XX 

DR WPI; 2002-426279/45. 
XX 

PT New isolated nucleic acid molecules that are associated with ileitis, for 

PT preventing, treating, modulating and diagnosing ileitis in a mammalian 

PT subject. 
XX 

PS Claim 1; Page 266-268; 273pp; English. 
XX 

CC The invention relates to a novel isolated nucleic acid molecule 

CC comprising a polynucleotide having one of 90 polynucleotide sequences, 

CC given in the specification. The polynucleotides of the invention have 

CC antiinflammatory activity, and may have a use in gene therapy. The 

CC polynucleotide or a polypeptide encoded by it is used for preventing, 

CC treating, modulating or ameliorating a medical condition such as ileitis. 

CC The polypeptide or polynucleotide is also useful for manufacturing a 

CC medicament for treating ileitis. The sequence represents a an extended 

CC cDNA digital sequence tag obtained from a mouse clone by the TOGA (total 

CC gene expression analysis) method 

XX 

SQ Sequence 2564 BP; 623 A; 722 C; 638 G; 581 T; 0 U; 0 Other; 

Query Match 6.2%; Score 97; DB 6; Length 2564; 
Best Local Similarity 100.0%; Pred. No. 2.7e-19; 

Matches 97; Conservative 0; Mismatches 0; Indels 0; Gaps 0 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 
I I I i I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I 



Db 97 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 3 8 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCC 97 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I I I I I 
Db 37 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCC 1 

RESULT 18 
ABK51681 

ID ABK51681 standard; DNA; 1920 BP. 
XX 

AC ABK51681; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE DNA encoding human ABCG5 protein. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 

KW chromosome 2p21; ds . 
XX 

OS Homo sapiens. 



XX 

FH Key Location/Qualifiers 

FT CDS 1. .1920 

FT /*tag= a 

FT /product= "Human ABCG5 protein" 

FT /transl_except= (pos: 4. .9, aa : GDLS SLTPGGSMGL ) 

FT /note- "This sequence contains 13 exons" 

XX 



PN WO200227016-A2 . 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU98984. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 38; Page 36-37; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 



CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the human ABCG5 gene located on chromosome 2p21. 

CC This sequence encodes the human ABCG5 protein of the invention 

XX 

SQ Sequence 1920 BP; 440 A; 503 C; 486 G; 491 T; 0 U; 0 Other; 

Query Match 5.9%; Score 93; DB 6; Length 1920; 

Best Local Similarity 84.0%; Pred. No. 4.3e-18; 

Matches 105; Conservative 0; Mismatches 2 0; Indels 0; Gaps 0; 

Qy 1162 AG CAAC CGTGTCGGGCCTTGGTG GAAC AT CAAAT CAT GC C AG CAGAAGT G GGAC AGGCAA 1221 

Ml I I II II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 106 AGCCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAG 165 

Qy 1222 AT C CT C AAAGAT GT CT CCT T GT AC AT C GAGAGT GGC C AGAT TAT GT G CAT CT T AG G C AG C 12 81 

I I I I I I II I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I III 
Db 166 AT CCT CAAAGAT GTCTCCTT GT AC GT G GAGAGC GG G C AGAT CAT GT GC AT C C TAG GAAGC 225 

Qy 1282 TCAGG 1286 

I I I I I 

Db 226 TCAGG 230 



RESULT 19 
AAD22011 

ID AAD22011 standard; DNA; 122 BP. 
XX 

AC AAD22011; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitos terolaemia susceptibility gene (SSG) exon 2. 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia ; hypercholesterolemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; ds . 

XX 

OS Homo sapiens. 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 



XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2 OOOUS-02 04234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 73; Fig 14B; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitos terolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is an exon of human SSG DNA 

XX 

SQ Sequence 122 BP; 27 A; 34 C; 38 G; 23 T; 0 U; 0 Other; 

Query Match 5.7%; Score 90; DB 6; Length 122; 
Best Local Similarity 83.6%; Pred. No. 7.7e-18; 

Matches 102 ; Conservative 0; Mismatches 20; Indels 0; Gaps 0; 

Qy 1164 CAAC CGTGTCGGGCCTTGGTG GAAC AT CAAAT CAT GC CAG C AGAAGT GG GAC AG GC AAAT 1223 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I 1 I I II i I I I I I I I I I I I I 

Db 1 CCACCGCGTGAGGCCCTGGTGGGACATCACATCTTGCCGGCAGCAGTGGACCAGGCAGAT 60 

Qy 1224 C CT C AAAGAT GT CT C CT T GT AC AT C GAGAGT G G C CAGAT TAT GT GC AT C T T AG GC AG CT C 1283 



Db 



61 




120 



Qy 



1284 



AG 12 85 



Db 



121 



AG 122 



RESULT 2 0 
AAD48881/c 
ID AAD48881 



standard; DNA; 2019 BP. 



XX 

AC AAD48881; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG8 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5; gene; ds . 
XX 

OS Mus sp. 
XX 

FH Key Location/Qualifiers 

FT CDS 1. .2019 

FT /*tag= a 

FT /product= M mABCG8 protein" 

FT /transl_except= (pos:1318. .1320, aa:Leu) 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2 001WO-US043823 . 
XX 

PR 20-NOV-2000; 2 0 00US- 0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31703. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 13; Page 75; 94pp; English. 

XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG8 DNA 
XX 

SQ Sequence 2019 BP; 444 A; 598 C; 510 G; 467 T; 0 U; 0 Other; 



Query Match 



4.0%; Score 63; DB 7; Length 2 019; 



Best Local Similarity 100.0%; Pred. No. 1.5e-08; 

63; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I M I M I 
63 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 4 

61 CAT 63 

I t I 
3 CAT 1 



ID ADA71938 standard; DNA; 2000 BP. 
XX 

AC ADA71938; 
XX 

DT 20-NOV-2003 (first entry) 
XX 

DE Rice gene, SEQ ID 5263. 
XX 

KW Plant; bacterial infection; fungal infection; viral infection; rice; 

KW gene; ds . 

XX 

OS Oryza sativa. 
XX 

PN WO20 03000 89 8-A1. 
XX 

PD 03-JAN-2003. 
XX 

PF 22-JUN-2001; 2001WO-IB001105 . 
XX 

PR 22-JUN-2001; 2001WO-IB001105 . 
XX 

PA (SYGN ) SYNGENTA PARTICIPATIONS AG. 
XX 

PI Chang H, Chen W, Cooper B, Glazebrook J, Goff SA, Hou Y; 

PI Katagiri F, Quan S, Tao Y, Whitham S, Xie Z, Zhu T, Zou G; 
XX 

DR WPI; 2003-175290/17. 
XX 

PT Identifying at least one gene involved in plant resistance or response to 

PT pathogenic infection for conferring resistance or tolerance to a plant to 

PT bacterial, fungal or viral infection by determining or detecting plant 

PT gene expression. 
XX 

PS Claim 27; SEQ ID NO 5263; 899pp; English. 
XX 

CC The present invention relates to a method (Ml) for identifying genes 

CC involved in plant resistance or response to pathogenic infection. Ml 

CC comprises identifying a gene whose expression is significantly altered in 

CC the incompatible interaction of plant gene expression relative to 

CC expression of the gene in an uninfected plant, in a mutant plant that 

CC does not express a gene associated with response to pathogenic infection, 

CC or in a corresponding incompatible or compatible interaction. (Ml) is 

CC useful for conferring resistance to resistance or tolerance to a plant to 



Matches 

Qy 

Db 

Qy 

Db 

RESULT 21 
ADA71938 



CC bacterial, fungal or viral infection. The present sequence was used to 

CC illustrate the invention. 

XX 

SQ Sequence 2000 BP; 336 A; 265 C; 284 G; 363 T; 0 U; 752 Other; 



Query Match 2.8%; Score 44.6; DB 7; Length 2000; 

Best Local Similarity 10.2%; Pred. No. 0.01; 

Matches 95; Conservative 412; Mismatches 416; Indels 8; Gaps 4; 



Qy 561 AGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAA 62 0 

: | ::::::::: : : I : : : : I :::::: I : : | : I : : : : : | 

Db 106 RGMRRSRMRWMGRYRRCARSGRMAGGSGRMMGGKSRMSYWMWCYARGCGSCKRKKSKGGS 165 

Qy 621 TGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGAC 68 0 

: | : : : | : | | : : : I : : : : : : : : : II" : : : I : : : I 

Db 166 WGKTCRRGARGGSGWSSGAKYKSGSMSKRMWMSSCGRSGCGRRSAYSRYYGTSRKYGTYK 225 

Qy 681 AGTCTGTAAC-AACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGAC 739 

Db 226 KMTYYSAS RCMRAYMTTSYSWACSSYTWCRSKRRSMMWKMMRKMRWSRSYGWYSWSYKM^ 285 

Qy 740 ATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTT 7 99 

Db 2 86 MCTAYKKSYYSRWCYMYRGGGWRGATRYWGRGYMSRMAMMYKKMYWYRGYKGMKRGWWAG 34 5 

Qy 8 00 CCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCT 85 9 

Db 346 RMMMRSMCRWSKACYYMRWRMWRMTRRRRWAKKSSRTS 405 

Qy 8 60 GGG G G CAC AAAAT G GAAT GAAC ACT G C T GAAG GAAT GCAG G GTT CACT T CAAGAAGAAAG 919 

Db 4 06 MRSCKRARWMKRCRSGRAWKMGCRGCMTCRMKSYGMMRWKSW 4 65 

Qy 920 C AGT GT GCAG GT GT AC CAT CT CC CAGT C AGAGAC C C AGT AAT C AGAGC AG CT AAT GG GAG 97 9 

Db 4 66 KKC S RTTMWGKT RGGMMGTMGRCRYKKRS GMKRKCRRRRWGRMYRMRWKRY YMSARYTMR 52 5 

Qy 980 G CAT GCTCCTTGGGTGGTGGC C AAC T T GT CAT T AT AC CT C C AAG GAC AAC AG AGT G GT AC 1039 

Db 526 YCARKKYSYSAARKARCWYRGKGYYWAGMWM^ 585 

Qy 1040 ATAAGGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGG 10 99 

: | : | I : I I I : :::::::::::::: I : I :: : : : : I I : i : I : 
Db 586 YYA — S CMKS ARKAGAKMCKRS KMSAWS KSMRS S RKCRKCAS KRS S AKRYAMMGGMT S GS 643 

Qy 1100 AGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTT 1159 

: : : : : | | : : : : : : : : : : : | : : : : : : : : : : : : : : 

Db 644 RMSRWKSYTCYWRKWGSMKSTCTWMYYMSKYTYAKYGSYWRYRY RAWCMYMWRWYYY 700 

Qy 1160 AAAGCAAC CGTGTCGGGCCTTGGTG GAAC AT CAAAT CAT G C CAG C AGAAGT G GGAC AG GC 1219 

:::::::: I ::: : II : : : : : I : : : I : I ::::::::::: I : 
Db 701 RYRSYMTYMAWYTSSTRMAMTGMKYSGRYWTSWYKYC — KCSWKYRSMWYYWSWWWAKTW 758 

Qy 1220 AAAT C CT CAAAGAT GT CT C C T T GT AC AT C G AGAGT G G C CAGAT TAT GT GC AT C T T AG GCA 1279 

Db 759 KMWRRYATRMMWMWYRYSMKWYTWCTMWGYWWYWWRTYMKMRYMW 818 



Qy 


1280 


GCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGG 


1339 


Db 


819 


|| : : 1 : 1 : : 1 1 : : : = = : : : : 1 : 1 : 1 : : : 
TGTWAAWWMAKTKMRMGMTGAKTRGRARK7VRYWWKWATW 


878 


Qy 


1340 


CTTTGGCTCYGTC T AAG C ACAAT GT T TAAGAAGT RAGT T T AAGT T GT AGAGAG G CAGC C A 


1399 


Db 


879 


KAWRKYYWSWMRAWYYYYKTRRTRY 


938 


Qy 


1400 


T G CAT T T G G CAT T T GAAT ACAAT CT GGT GAC TT GT CT G G CT GC CAAT AGAACCT AGT AC C 
... . . i i 1 . 1 . . 1 . . . ....t.it.. 


1459 


Db 


939 


: ... . . | | | . | . . i . . . . . . . i . i i . . 

WT K YW YW YCT T WKT^CGRAT K YMC C AGWW^ 


998 


Qy 


1460 


AAAGT GAAAT CTT GAGGAAAAT CC CT GGAAA 14 9 0 




Db 


999 


: : : : 1 : 1 : : : : : 1 : 1 III II 

MMW KT RAW S K S Y ARAYW KMAG C AC C T AC AC A 102 9 





RESULT 22 
ADA71938/C 

ID ADA71938 standard; DNA; 2000 BP. 
XX 

AC ADA71938; 
XX 

DT 20-NOV-2003 (first entry) 
XX 

DE Rice gene, SEQ ID 5263. 
XX 

KW Plant; bacterial infection; fungal infection; viral infection; rice; 

KW gene; ds . 

XX 

OS Oryza sativa. 
XX 

PN WO2003000 8 98-A1. 
XX 

PD 03-JAN-2003. 
XX 

PF 22-JUN-2001; 2 001WO-IB001105 . 
XX 

PR 22-JUN-2001; 2001WO-IB001105 . 
XX 

PA (SYGN ) SYNGENTA PARTICIPATIONS AG. 
XX 

PI Chang H, Chen W, Cooper B, Glazebrook J, Goff SA, Hou Y; 

PI Katagiri F, Quan S, Tao Y, Whitham S, Xie Z, Zhu T, Zou G; 
XX 

DR WPI; 2003-175290/17. 
XX 

PT Identifying at least one gene involved in plant resistance or response to 

PT pathogenic infection for conferring resistance or tolerance to a plant to 

PT bacterial, fungal or viral infection by determining or detecting plant 

PT gene expression. 
XX 

PS Claim 27; SEQ ID NO 5263; 899pp; English. 
XX 

CC The present invention relates to a method (Ml) for identifying genes 

CC involved in plant resistance or response to pathogenic infection. Ml 

CC comprises identifying a gene whose expression is significantly altered in 



CC the incompatible interaction of plant gene expression relative to 

CC expression of the gene in an uninfected plant, in a mutant plant that 

CC does not express a gene associated with response to pathogenic infection, 

CC or in a corresponding incompatible or compatible interaction. (Ml) is 

CC useful for conferring resistance to resistance or tolerance to a plant to 

CC bacterial, fungal or viral infection. The present sequence was used to 

CC illustrate the invention. 

XX 

SQ Sequence 2000 BP; 336 A; 265 C; 284 G; 363 T; 0 U; 752 Other; 

Query Match 2.4%; Score 38.2; DB 7; Length 2000; 

Best Local Similarity 8.3%; Pred. No. 1.1; 

Matches 54; Conservative 292; Mismatches 300; Indels 1; Gaps 1; 

Qy 233 AGAT AAGGAC ACT C T G G CT AAAG GT ACAT C AGATAAT GGC AT C GTT G G C CAAAT T G GT GA 292 

: : : : : | : | : : :::::: : | : : ::::::::: : | | : : 
Db 651 RSMWYSKYSCSAKCCKKTRYMTSSYMSTGMYGMYSSYKSMSWTSKMSYMGKMTCTMYTSM 592 



Qy 2 93 ACT GT TAT C T C AC GAGGAT T C C AGG G C T GGGT AGGAT C G GAC AG GGC AC T C C CAT T GG C T 352 

Db 591 KGST RRS KMGRWS GMS RMYMRWWKKMRKRK YMRYMKWKCTWRRCMC YRWGYTMYTT S RS R 532 

Qy 353 CCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTT 412 

Db 531 MMYTGRYKARYTSKRRYMWYKYRKYCWYYYYGMYMKCSYMMRYGYCKACKKCCYAMCWKA 472 



413 CCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACAT 472 

: | : : :::::::: | :: : :::::: 1 I : I I I : : : : : I 

471 AYSGMMMYWYRKYSKWMRMSTKYMWSMWYKKC 412 

4 73 CAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCA 532 

411 YMGSYKYSRCYKYMRMYMYKGWMYMMYYSAYSSMMTWYYYYAKYWKYWYKRRGTMSWYGK 352 



Qy 533 CAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAA 592 

Db 351 SYKKKYCTWWCYMKCMRCYRWRKMMRKKTKYSKRCYCWRYATCYWCCCYRKRGWYSRRSM 292 

Qy 593 GCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTG-GGTTGT 651 

|||::::::::::::::::::::::: : : | : | | : : : : : : : : I 
Db 291 MRTAGKWKMRSWSRWCRSYSWYKMYKKMWKKSYYMSYGWARSSGTWSRSAAKRTYKGYST 232 

Qy 652 C T GT C C AGC AG AT C AG G GT GAAAGT G GAC AGT C T GT AAC AAC AGT G AGT CGTTCCTCCTC 711 

Db 231 SRRAKMMRACRMYSACRRYSRTSYYCGCSYCGSSKWKYMSKSCSMRMTCSSWCSCCYTCY 172 

Qy 712 CTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCT 771 

: : : : : : : | : | : : : : : : : I I I : : i : I : : : 

Db 171 YGAMCWSCCMSMMYMGSCGCYTRGWKWRSKYSMCCKKYCSCCTKYCSYTGYYRYCKWYKY 112 

Qy 772 GCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGA 8 31 

Db 111 S YYKCYYCYCYWYMS YMRYMMKCMCS RSCS SWMSCAYCST S ST SRWMSMYYAAKMGMCGS 52 



Qy 832 TGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATG 87 8 



51 SGMYRMSKSCKMYSKYSSCKYTGSKKCTKRKYYYCYWSSGYSMWCTS 5 



RESULT 23 
ADE57382/c 

ID ADE57382 standard; DNA; 2692 BP. 
XX 

AC ADE57382; 
XX 

DT 29-JAN-2004 (first entry) 
XX 

DE Rat gene L38644, SEQ ID NO 3243. 
XX 

KW Rat; ds ; gene; pain; neuronal tissue; gene therapy; 

KW spinal segmental nerve injury; chronic constriction injury; CCI; 

KW spared nerve injury; SNI; Chung. 

XX 

OS Rattus norvegicus. 
XX 

PN WO2003016475-A2 . 
XX 

PD 27-FEB-2003. 
XX 

PF 14-AUG-2002; 2002WO-US025765 . 
XX 

PR 14-AUG-2001; 2001US-0312147P. 

PR 01-NOV-2001; 2001US-0346382P . 

PR 26-NOV-2001; 2 001US-0333347P . 
XX 

PA (GEHO ) GEN HOSPITAL CORP. 

PA (FARB ) BAYER AG. 

XX 

PI Woolf C, D'urso D, Befort K f Costigan M; 
XX 

DR WPI; 2003-268312/26. 

DR GENBANK; L38644. 
XX 

PT New composition comprising two or more isolated polypeptides, useful for 

PT preparing a medicament for treating pain in an animal. 

XX 

PS Claim 1; Page; 1017pp; English. 
XX 

CC The invention discloses a composition comprising two or more isolated rat 

CC or human polynucleotides or a polynucleotide which represents a fragment, 

CC derivative or allelic variation of the nucleic acid sequence. Also 

CC claimed are a vector comprising the novel polynucleotide, a host cell 

CC comprising the vector, a method for identifying a nucleotide sequence 

CC which is differentially regulated in an animal subjected to pain and a 

CC kit to perform the method, an array, a method for identifying an agent 

CC that increases or decreases the expression of the polynucleotide sequence 

CC that is differentially expressed in neuronal tissue of a first animal 

CC subjected to pain, a method for identifying a compound which regulates 

CC the expression of a polynucleotide sequence which is differentially 

CC expressed in an animal subjected to pain, a method for identifying a 

CC compound that regulates the activity of one or more of the 

CC polynucleotides, a method for producing a pharmaceutical composition, a 

CC method for identifying a compound or small molecule that regulates the 

CC activity in an animal of one or more of the polypeptides given in the 



CC specification, a method for identifying a compound useful in treating 

CC pain and a pharmaceutical composition comprising the one or more 

CC polypeptides or their antibodies . The polynucleotide or the compound that 

CC modulates its activity is useful for preparing a medicament for treating 

CC pain (e.g. spinal segmental nerve injury (Chung), chronic constriction 

CC injury (CCI) and spared nerve injury (SNI) ) in an animal (e.g. gene 

CC therapy) . The sequence presented is a rat DNA (shown in Table 2 of the 

CC specification) which encodes one of the polypeptides of the invention 

CC which is differentially expressed during pain. Note: The sequence data 

CC for this patent did not form part of the printed specification, but was 

CC obtained in electronic form directly from WIPO at 

CC f tp . wipo . int/pub/published__pct_sequences . 
XX 

SQ Sequence 2692 BP; 738 A; 621 C; 711 G; 622 T; 0 U; 0 Other; 

Query Match 2.4%; Score 37; DB 9; Length 2692; 

Best Local Similarity 51.5%; Pred. No. 3.2; 

Matches 85; Conservative 0; Mismatches 80; Indels 0; Gaps 0; 

Qy 7 03 TTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAG 7 62 

II III I III I I I I I I I I I I I I I III IN II 

Db 235 T GCAAC CT GAT GT T T C C GGAT T T G C C AGAC T CT G GAC GT T C C AC CAGAAC GT GGGAGAT T 17 6 

Qy 763 CCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTC 822 

I I I I I I II I I I I I I I I II I I I II I I II IN 

Db 17 5 CTCCCAGCCGCACCTCAAGGAATTCTGCGCCCTTCCAGCTCAGCCGATCGGAGACACGTC 116 

Qy 823 CTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 867 

II I I I I I I I I I I I II I I I I I I I I I I 
Db 115 T T CT C GAGAT GGT T AT AGCT C CAT GC G GAG GT GAGGC G GC G G GAC 71 



RESULT 2 4 
AAH24065/C 

ID AAH24065 standard; DNA; 4590 BP. 
XX 

AC AAH24065; 
XX 

DT 29-AUG-2001 (first entry) 
XX 

DE Yeast AOD9 604-associated DNA sequence, SEQ ID NO:l. 
XX 

KW Human growth hormone analogue peptide; hGH; AOD9604; lipid metabolism; 

KW modulation; lipolysis stimulation; hormone-sensitive lipase stimulation; 

KW lipogenesis inhibition; acetyl CoA carboxylase inhibition; obesity; 

KW functional food; transgenic yeast; fat/lean ratio; food use; ds . 
XX 

OS Saccharomyces cerevisiae . 



XX 

FH Key Location/Qualifiers 

FT misc_f eature 10 

FT / + tag= a 

FT /note= "Represented as * in the specification" 

FT misc_feature 3617 

FT /*tag= b 

FT /note= "Represented as * in the specification" 

FT misc feature 3649 



FT /*tag= c 

FX /note= "Represented as * in the specification" 

FT misc_f eature 3679 

FT /*tag= d 

FT /note= "Represented as * in the specification" 

FT misc_f eature 3819 

FT /*tag= e 

FT /note= "Represented as * in the specification" 

FT misc_feature 3862 

FT /*tag= f 

FT /note= "Represented as * in the specification" 

FT misc_feature 3864 

FT /*tag= g 

FT /note= "Represented as * in the specification" 

FT misc_f eature 3888 

FT /*tag= h 

FT /note= "Represented as * in the specification" 

FT misc_f eature 3890 

FT /*tag= i 

FT /note= "Represented as * in the specification" 

FT misc_f eature 3912 

FT /*tag= j 

FT /note= "Represented as * in the specification" 

FT misc_feature 3914 

FT /*tag= k 

FT /note= "Represented as * in the specification" 

FT misc_f eature 3938 

FT /*tag= 1 

FT /note= "Represented as * in the specification" 

FT misc_f eature 3939 

FT /*tag- m 

FT /note= "Represented as * in the specification" 

FT misc_f eature 3941 

FT /*tag= o 

FT /note= "Represented as * in the specification" 

FT misc_feature 3943 

FT /*tag= p 

FT /note= "Represented as * in the specification" 

FT misc_f eature 4361 

FT /*tag= q 

FT /note= "Represented as * in the specification" 
XX 

PN WO200133977-A1. 
XX 

PD 17-MAY-2001. 
XX 

PF 06-NOV-2000; 2000WO-AU001362 . 
XX 

PR 05-NOV-1999; 99AU-00003 87 5 . 
XX 

PA (MET A- ) METABOLIC PHARM LTD. 
XX 

PI Belyea CI, Ng FM, Vaughan P; 
XX 

DR WPI; 2001-328876/34. 
XX 

PT New organisms containing nucleic acid encoding a growth hormone fragment 



PT which modulates lipid metabolism are useful to produce dietary aids for 

PT obesity and in the meat production industry. 

XX 

PS Disclosure; Page 48-50; 54pp; English. 
XX 

CC The invention relates to novel transgenic organisms useful in the 

CC production of functional food and drink products for the treatment or 

CC prevention of obesity via the regulation of lipid metabolism. The 

CC organisms comprise a polynucleotide encoding a growth hormone fragment 

CC capable of stimulating the activity of hormone-sensitive lipase (the key 

CC enzyme in lipolysis) and inhibiting acetyl CoA carboxylase (the key 

CC enzyme in lipogenesis ) . The growth hormone fragment preferably contains 

CC at least the disulphide-bonded loop of a mammalian growth hormone (but is 

CC not the full-length growth hormone) and is optionally linked to an 

CC epitope tag or heterologous fusion protein partner. The transgenic 

CC organism may be a microorganism used to produce a fermented product 

CC (e.g., yeast), or an edible plant or animal or cell thereof. Food or 

CC drink made using methods of the invention are used to modify fat/lean 

CC ratio, lipid metabolism or food use in a mammal. In particular, the food 

CC or drink products may be used to treat or prevent obesity, particularly 

CC in humans, and may also be used to improve the fat/lean ration of 

CC livestock raised for meat production. In the exemplification of the 

CC invention, the human growth hormone (hGH) fragment analogue AOD9604 was 

CC expressed in yeast, optionally fused to the FLAG epitope (AAB73625) . The 

CC present sequence is described as a DNA sequence from yeast in the 

CC sequence listing, but is not further referred to in the specification 

XX 

SQ Sequence 4590 BP; 661 A; 384 C; 127 G; 522 T; 0 U; 2896 Other; 

Query Match 2.4%; Score 37; DB 5; Length 4590; 

Best Local Similarity 9.3%; Pred. No. 4.3; 

Matches 60; Conservative 298; Mismatches 2 85; Indels 0; Gaps 0; 



Qy 


210 


ACCGTGTGTTCTGCCTATTGTCGAGATAAGGACACTCTGGCTAAAGGTACATCAGATAAT 


269 


Db. 


4375 


:::::::: :::: |: : |: :| : |: : :::| |::::: :| : : 

VYSYYTDSYRYANAYHHHVNTHCHAADGMGTDDAYCHSYYHYWASYGKHSRHNWGSHNHN 


4316 


Qy 


270 


G G CAT C GT T GGC CAAAT T GGT GAAC T GT TAT C T C AC GAGGAT T C CAG GGCT GG GT AG GAT 


329 


Db 


4315 


::: :::::::: : :: :| |: ::::::::::::: ::| |::: 

S RHNWS S DDS RHNWS RHNWAHGS S AT KAS GHHYWHAS S VKDHS WDDWNYYGT YTVKRSN 


4256 


Qy 


330 


CGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACT 


389 


Db 


4255 


TKYWNS HACKS SWMMSWWMSMYHSTBTSRYBGYATKAGSRHNWHSTBTSRYBGYATKAGS 


4196 


Qy 


390 


AGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCA 
: : : | : : : | : | | : | : : : : : : : : : : : : I : I : 1 : 
RHNWHSTBTSRYBGYATKAGSRHNWHSTBTSRYBGYATKAGSRHNWHSTBTSRYBGYATK 


449 


Db 


4195 


4136 


Qy 


450 


GAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCG 


509 


Db 


4135 


AGSRHNWHSTBTSRYBGYATKAGSRHNWGHMSRHNWKDSVKSRHNWNMYHWCARRYWBH 


4076 


Qy 


510 


GTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGG 


569 


Db 


4075 


VHNMRWMKKKKMGKKHGSYVKNNYVKNCTYYAYYHTDANDTYCTYTATHTDMGCNHTDDD 


4016 



Qy 570 TAAGGGGACCTCCACAGCAAAAAGCTAGGCTCT CT GATTGCCTTTT CTGAAT GGGT GGGT 629 

: : | : : : : : : : : :::: | ::::::::::: : : : : : : 

Db 4015 DKGTKYNTTTTDHKMDKGGGDKGKDCMDHDHDDMDGBBBBBBBBBBBBBBBBKMCHVTDG 3956 

Qy 630 GGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGTAA 689 

Db 3955 ANDHDHDHDHGANDNDNNDKDKDCYNKRRBHHHDHDHDBYVNDNDGWHNDHDHDHDHDHD 3896 

QY 690 CAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACAT 749 

:::: : :::: ::::::::: :: : : : :|::|| :l h :::: I == 

Db 38 95 HHWBNDNWBHGHDBYVWBVYHNHHDHDHVYNDNDGRDCANKHGMTHGMRRKKHKAGHMS 38 36 

Qy 750 GCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCG 809 

: : : : : : : I : : : I I I | I : I : I : : : | : I : : : I II : 

Db 3835 RHNWKDSVKATKYCYKNKTNCTCTCTTTYASTSRNYAATMYTKHTYAHNTANATAAASNS 3776 

Qy 810 C CC AC CAC CT GT C CT GT GT AGAT G GAGAAG G CT C GGAGAGT GG 8 52 

: | : I : : : : : : | : : : I : : : : I : I 

Db 3775 WMGT D DAYC S RN YAAT AN AT Y D ARVHAAN KB H T YAS HN HN T D G 3733 

RESULT 25 
ABN83968 

ID ABN83968 standard; DNA; 6843 BP. 
XX 

AC ABN83968; 
XX 

DT 06-SEP-2002 (first entry) 
XX 

DE Human gene sequence #15. 
XX 
KW 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 
FT CDS 6088. .6471 

FT /*tag= a 

FT /partial 
FT 
XX 

PN WO200252005-A1. 
XX 

PD 04-JUL-2002. 
XX 

PF 20-DEC-2001; 2001WO- JP011217 . 
XX 

PR 22-DEC-2000; 2000 JP-00389742 . 
XX 

PA (KAZU-) KAZUSA DNA RES INST FOUND. 
PA (CELE-) CELESTAR LEXICO-SCI LTD. 
XX 

PI Ohara O, Nagase T, Nakajima D; 
XX 

DR WPI; 2002-500762/53. 
DR P-PSDB; ABB97948. 
XX 



Human; brain; tonsil; hippocampus; foetal brain; diagnosis; gene; ds, 



/note= "no start codon present" 



PT Genes and their expression products cloned from human cDNA libraries for 

PT treatment and diagnosis of diseases associated with their expression. 
XX 

PS Claim 1(a); Page 123-127; 238pp; Japanese. 

XX 

CC The invention relates to DNA encoding polypeptides directly cloned from 

CC cDNA libraries originating in adult whole brain, human tonsil, human 

CC adult hippocampus and human foetal whole brain. Polypeptides and 

CC polynucleotides of the invention may be used in the investigation of 

CC differential expression of the DNA sequences in normal subjects and 

CC disease patients. They may also be used in the production of antibodies, 

CC oligonucleotide probes and DNA chips for diagnosis and identification of 

CC drugs for treatment of diseases with which the DNA sequences are 

CC associated. The sequences given in records ABN83954-ABN83984 represent 

CC human gene sequences of the invention 



XX 
SQ 



Sequence 6843 BP; 1812 A; 1693 C; 1483 G; 1855 T; 0 U; 0 Other; 

Query Match 2.3%; Score 36.8; DB 6; Length 6843; 

Best Local Similarity 49.0%; Pred. No. 6.3; 

Matches 98; Conservative 0; Mismatches 102; Indels 0; Gaps 



Qy 


741 


TTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTC 

i ii iii i ii i i ii mi i i> 

TTAATATACTCTATGGATGACCCAGCAAGTTTGCTGTTTCAGAATCCTCCTCTTCTGTTT 


800 


Db 


689 


748 


Qy 


801 


CTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTG 

| | | | || 11 III 1 M 1 M 1 MM 1 

T T T GAAC T T T C GAAAACAAAAGAT AT GCT G GGAGAC G C G G C C C CT AGAGT GT GCT T AC T C 


860 


Db 


749 


808 


Qy 


861 


GGGGCACAAAATGGAATGAACACTGCTGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGC 

M | | | | 1 M 1 1 Ml 1 M 

C AG GT C CT T GAT T GT CC AGACT GT GGAGG G GGAAGG GCAG AT C TAT G C CAAGAG GGGAAC 


920 


Db 


809 


868 


Qy 


921 


AGT GT G CAGGT GT AC CAT CT 94 0 

II 1 1 1 1 1 M 1 1 
AGGCTGTAGAGGCCACAGCT 888 




Db 


869 





RESULT 2 6 
AAS93276/c 

ID AAS93276 standard; cDNA; 541 BP. 
XX 

AC AAS93276; 
XX 

DT 13-FEB-2002 (first entry) 
XX 

DE DNA encoding novel human diagnostic protein #29080. 
XX 

KW Human; chromosome mapping; gene mapping; gene therapy; forensic; 

KW food supplement; medical imaging; diagnostic; genetic disorder; ss. 
XX 

OS Homo sapiens. 
XX 

PN WO200175067-A2. 
XX 

PD ll-OCT-2001. 



XX 

PF 30-MAR-2001; 2 001WO-US008631 . 
XX 

PR 31-MAR-2000; 2 000US-0054 0217 . 

PR 23-AUG-2000; 2 000US-00649167 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Drmanac RT, Liu C, Tang YT; 
XX 

DR WPI; 2001-639362/73. 

DR P-PSDB; ABG29089. 
XX 

PT New isolated polynucleotide and encoded polypeptides , useful in 

PT diagnostics, forensics, gene mapping, identification of mutations 

PT responsible for genetic disorders or other traits and to assess 

PT biodiversity. 
XX 

PS Claim 1; SEQ ID NO 29080; 103pp; English. 
XX 

CC The invention relates to isolated polynucleotide (I) and polypeptide (II) 

CC sequences. (I) is useful as hybridisation probes, polymerase chain 

CC reaction (PCR) primers, oligomers, and for chromosome and gene mapping, 

CC and in recombinant production of (II) . The polynucleotides are also used 

CC in diagnostics as expressed sequence tags for identifying expressed 

CC genes. (I) is useful in gene therapy techniques to restore normal 

CC activity of (II) or to treat disease states involving (II). (II) is 

CC useful for generating antibodies against it, detecting or quantitating a 

CC polypeptide in tissue, as molecular weight markers and as a food 

CC supplement. (II) and its binding partners are useful in medical imaging 

CC of sites expressing (II) . (I) and (II) are useful for treating disorders 

CC involving aberrant protein expression or biological activity. The 

CC polypeptide and polynucleotide sequences have applications in 

CC diagnostics, forensics, gene mapping, identification of mutations 

CC responsible for genetic disorders or other traits to assess biodiversity 

CC and to produce other types of data and products dependent on DNA and 

CC amino acid sequences. AAS64197-AAS94564 represent novel human diagnostic 

CC coding sequences of the invention. Note: The sequence data for this 

CC patent did not appear in the printed specification, but was obtained in 

CC electronic format directly from WIPO at 

CC ftp . wipo . int/pub/published_pct_sequences 

XX 

SQ Sequence 541 BP; 121 A; 154 C; 168 G; 98 T; 0 U; 0 Other; 

Query Match 2.3%; Score 36.2; DB 5; Length 541; 
Best Local Similarity 50.3%; Pred. No. 2.2; 

Matches 83; Conservative 2; Mismatches 80; Indels 0; Gaps 0; 

Qy 117 4 GGGCCTTGGT GGAAC AT C AAAT CAT GC C AG C AGAAGT GG GACAG GCAAAT C CT C AAAGAT 1233 

I I I I I I I I I I I I I I I III I II II I I I I I I I 

Db 223 GTGCCTTGCTGGATCCTCTGCGCCGCGCAGATGCCGTAGGCCAGGCCGGGCACAGTACTG 164 

Qy 1234 GTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGC 1293 

II I I I I I II III I II II I I I I I I I I I I Ml 

Db 163 GTGCAGAGGCACACCTCGCGAGGCAGGTCCCGCAGCCACTCCGGCAGCTCCGGCGGGGGC 104 



Qy 12 94 CTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAG 1338 



I I I I :: I I I I I I I I I I I I I I I I I I I 

Db 103 GCGGCGGCCGCAGCGCTGCTGGTGCCCACAAGCCGGCGCAGCGAG 59 



RESULT 27 
ADA08012/C 

ID ADA08012 standard; cDNA; 2236 BP. 
XX 

AC ADA08 012; 
XX 

DT 06-NOV-2003 (first entry) 
XX 

DE cDNA encoding human PR Family member 3b (PFM3b) 
XX 
KW 
KW 
KW 



Human; PR Family Member 1; PR Family Member 2; PR Family Member 3; 
PR Family Member 4; PR Family Member 5; PFM1; PFM2 ; PFM3; PFM4; PFM5; 
PFM PR domain; PFM zinc finger domain; PFM ZF domain; 

KW modulation of cell growth; cancer; cell degeneration disease; 

KW Alzheimer's disease; Parkinson's disease; 

KW insulin-dependent diabetes mellitus; IDDM; neuroprotective; 

KW antiparkinsonian; antidiabetic; cytostatic; gene; ss. 

XX 

OS Homo sapiens. 
XX 

PN US6586579-B1. 
XX 

PD 01-JUL-2003. 
XX 

PF 03-SEP-1999; 99US- 003 8 995 6 . 
XX 

PR 03-SEP-1999; 99US-00389956 . 
XX 

PA (BURN-) BURN HAM INST. 
XX 

PI Huang S; 
XX 

DR WPI; 2003-669568/63. 

DR P-PSDB; ADA08013. 
XX 

PT New PR Family Member 2 oligonucleotide, useful for preparing a 

PT composition for modulating cell growth for treating cancer or diseases of 

PT cell degeneration, e.g., Alzheimer f s disease or insulin-dependent 

PT diabetes mellitus. 

XX 

PS Example 3; Fig 6A; 95pp; English. 
XX 

CC The present invention relates to the isolation of human and mouse PR 

CC Family Member (PFM) proteins, and the polynucleotide sequences encoding 

CC them. Also disclosed are PFM PR and PFM zinc finger (ZF) domains, and the 

CC polynucleotide sequences encoding them. The invention also discloses PFM 

CC oligonucleotides and methods for detecting a PFM polynucleotide sequence 

CC in a sample. The PFM polypeptide and polynucleotide sequences are useful 

CC for preparing a composition for modulating cell growth for treating 

CC cancer or diseases of cell degeneration, e.g. as Alzheimer's disease, 

CC Parkinson's disease or insulin-dependent diabetes mellitus (IDDM). The 

CC present sequence represents a PFM polynucleotide sequence of the 

CC invention. 



XX 

SQ Sequence 2236 BP; 516 A; 656 C; 580 G; 484 T; 0 U; 0 Other; 

Query Match 2.3%; Score 36.2; DB 8; Length 2236; 

Best Local Similarity 50.3%; Pred. No. 5.1; 

Matches 83; Conservative 2; Mismatches 80; Indels 0; Gaps 0; 

Qy 1174 GGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGAT 1233 

I I I I I I I I I I I I I I I III I I I I I I II I I I I 

Db 713 GTGCCTTGCTGGATCCTCTGCGCCGCGCAGATGCCGTAGGCCAGGCCGGGCACAGTACTG 654 

Qy 12 34 GTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGC 1293 

|| I I I I I I I I I I I II II I I I I I I II II IN 

Db 653 GTGCAGAGGCACACCTCGCGAGGCAGGTCCCGCAGCCACTCCGGCAGCTCCGGCGGGGGC 594 



Qy 


1294 CTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAG 1338 






1 1 1 1 : : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


593 GCGGCGGCCGCAGCGCTGCTGGTGCCCACAAGCCGGCGCAGCGAG 54 9 




RESULT 28 




ADAO! 


3010/c 




ID 


ADA08010 standard; cDNA; z4oo BP. 




XX 






TV /-< 
AC 


AJJAU oUlU/ 




XX 






DT 


06-NOV-2003 (first entry) 




XX 






DE 


cDNA encoding human PR Family member 3a (PFM3a) . 




XX 








Human; PR Family Member 1; PR Family Member 2; PR Family Member 




KW 


PR Family Member 4; PR Family Member 5; PFM1; PFM2; PFM3; PFM4; 


Jrr IVIO f 


KW 


PFM PR domain; PFM zinc finger domain; PFM ZF domain; 




KW 


modulation of cell growth; cancer; cell degeneration disease; 




KW 


Alzheimer's disease; Parkinson f s disease; 




KW 


insulin-dependent diabetes mellitus; IDDM; neuroprotective; 




KW 


antiparkinsonian; antidiabetic; cytostatic; gene; ss. 




XX 






OS 


Homo sapiens. 




XX 






PN 


US6586579-B1. 




XX 






PD 


01-JUL-2003. 




XX 






PF 


03-SEP-19 99; 9 9US-00389956 . 




XX 






PR 


03-SEP-1999; 99US-0 038 9 95 6 . 




XX 






PA 


(BURN- ) BURN HAM INST. 




XX 






PI 


Huang S; 




XX 






DR 


WPI; 2003-669568/63. 




DR 


P-PSDB; ADA08011. 




XX 






PT 


New PR Family Member 2 oligonucleotide, useful for preparing a 




PT 


composition for modulating cell growth for treating cancer or diseases of 



PT cell degeneration, e.g., Alzheimer's disease or insulin-dependent 

PT diabetes mellitus . 

XX 

PS Example 3; Fig 5A; 95pp; English. 
XX 

CC The present invention relates to the isolation of human and mouse PR 

CC Family Member (PFM) proteins, and the polynucleotide sequences encoding 

CC them. Also disclosed are PFM PR and PFM zinc finger (ZF) domains, and the 

CC polynucleotide sequences encoding them. The invention also discloses PFM 

CC oligonucleotides and methods for detecting a PFM polynucleotide sequence 

CC in a sample. The PFM polypeptide and polynucleotide sequences are useful 

CC for preparing a composition for modulating cell growth for treating 

CC cancer or diseases of cell degeneration, e.g. as Alzheimer f s disease, 

CC Parkinson's disease or insulin-dependent diabetes mellitus (IDDM) . The 

CC present sequence represents a PFM polynucleotide sequence of the 

CC invention . 
XX 

SQ Sequence 2488 BP; 587 A; 707 C; 645 G; 549 T; 0 U; 0 Other; 

Query Match 2.3%; Score 36.2; DB 8; Length 2488; 

Best Local Similarity 50.3%; Pred. No. 5.4; 

Matches 83; Conservative 2; Mismatches 80; Indels 0; Gaps 0; 

Qy 117 4 GGGC CT T GGT G GAAC AT CAAAT CAT GC C AG C AGAAGT GGGAC AG GC AAAT C C T CAAAGAT 1233 

I I I I I I I I I I I I I I I 111 I I I 1 I I I I I I I I 

Db 713 GTGCCTTGCTGGATCCTCTGCGCCGCGCAGATGCCGTAGGCCAGGCCGGGCACAGTACTG 654 

Qy 1234 GTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTT^AGTGC 1293 

II I I I I I I I III I II M I 1 I II I M I I I I I 

Db 653 GTGCAGAGGCACACCTCGCGAGGCAGGTCCCGCAGCCACTCCGGCAGCTCCGGCGGGGGC 594 

Qy 1294 CTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAG 1338 

I I I I :: I I I I I I I I I I I I I I I I I I 1 

Db 5 93 GCGGCGGCCGCAGCGCTGCTGGTGCCCACAAGCCGGCGCAGCGAG 54 9 



RESULT 2 9 


AAS93277/c 


ID 


AAS93277 standard; cDNA; 4003 BP. 


XX 




AC 


AAS93277; 


XX 




DT 


13-FEB-2002 (first entry) 


XX 




DE 


DNA encoding novel human diagnostic protein #29081. 


XX 




KW 


Human; chromosome mapping; gene mapping; gene therapy; forensic- 


KW 


food supplement; medical imaging; diagnostic; genetic disorder; ss 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200175067-A2 . 


XX 




PD 


ll-OCT-2001. 


XX 




PF 


30-MAR-2 001; 2001WO-US008631 . 


XX 





PR 31-MAR-2000; 2000US-00540217 . 

PR 23-AUG-2000; 2000US-00649167 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Drmanac RT, Liu C, Tang YT; 
XX 

DR WPI; 2001-639362/73. 

DR P-PSDB; ABG29090. 
XX 

PT New isolated polynucleotide and encoded polypeptides, useful in 

PT diagnostics, forensics, gene mapping, identification of mutations 

PT responsible for genetic disorders or other traits and to assess 

PT biodiversity. 
XX 

PS Claim 1; SEQ ID NO 29081; 103pp; English. 
XX 

CC The invention relates to isolated polynucleotide (I) and polypeptide (II) 

CC sequences. (I) is useful as hybridisation probes, polymerase chain 

CC reaction (PCR) primers, oligomers, and for chromosome and gene mapping, 

CC and in recombinant production of (II). The polynucleotides are also used 

CC in diagnostics as expressed sequence tags for identifying expressed 

CC genes. (I) is useful in gene therapy techniques to restore normal 

CC activity of (II) or to treat disease states involving (II) . (II) is 

CC useful for generating antibodies against it, detecting or quantitating a 

CC polypeptide in tissue, as molecular weight markers and as a food 

CC supplement. (II) and its binding partners are useful in medical imaging 

CC of sites expressing (II) . (I) and (II) are useful for treating disorders 

CC involving aberrant protein expression or biological activity. The 

CC polypeptide and polynucleotide sequences have applications in 

CC diagnostics, forensics, gene mapping, identification of mutations 

CC responsible for genetic disorders or other traits to assess biodiversity 

CC and to produce other types of data and products dependent on DNA and 

CC amino acid sequences. AAS64197-AAS94564 represent novel human diagnostic 

CC coding sequences of the invention. Note: The sequence data for this 

CC patent did not appear in the printed specification, but was obtained in 

CC electronic format directly from WIPO at 

CC ftp . wipo . int/pub/published__pct_sequences 

XX 

SQ Sequence 4003 BP; 894 A; 1117 C; 1153 G; 839 T; 0 U; 0 Other; 



Query Match 2.3%; Score 36.2; DB 5; Length 4003; 

Best Local Similarity 50.3%; Pred. No. 7.1; 

Matches 83; Conservative 2; Mismatches 80; Indels 0; Gaps 0; 

Qy 1174 GGGCCTTGGT GGAACAT C AAAT CAT G C C AG CAGAAGT GG GAC AGGCAAAT C CT CAAAGAT 1233 

I I I I I I I I I I I I I I I III I I I I I I II I I I I 

Db 1033 GTGCCTTGCTGGATCCTCTGCGCCGCGCAGATGCCGTAGGCCAGGCCGGGCACAGTACTG 974 

Qy 1234 GTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGC 1293 

II I I I I I I I I I I I II II I I I I I I I I I I III 

Db 973 GTGCAGAGGCACACCTCGCGAGGCAGGTCCCGCAGCCACTCCGGCAGCTCCGGCGGGGGC 914 

Qy 1294 CTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAG 1338 

I I I I : : I I I I I I I I I I I I I I I I I I I 

Db 913 GCGGCGGCCGCAGCGCTGCTGGTGCCCACAAGCCGGCGCAGCGAG 8 69 



RESULT 30 


AAL19738 


T Pi 
X U 


aat i QiQQ ci-andsrd: cDNA; 397 BP. 


YY 






AAT.1 97 "3ft ■ 


AA 




"HT 
JJ ± 


fi7-nFr-?fl01 (fi rst entrv) 


VV 
AA 






u 11Trian K^pact- (-Anc^r pxnrpssed nol vnucleotide 12195. 


V V 

AA 






U 11TTlari • Kvpacf C^T^ Cf^T * CF^ll marker; CVtOStatiC* SS. 


XX 




OS 


Homo sapiens . 


XX 




PN 


WO/UUIjIdZo Az . 


XX 




PD 


iy- JUL— zUUl . 


XX 




PF 




XX 




PR 


14 Tatj— 9nnn* 9 o o ott^— o 1 7 607 7 p 


PR 


i/i ivtad onnn • onn pit tq — niftQlf^p 


PR 


oa nnn • 9000tt^— m 99099P 


PR 


Z y "IViAK - Z U U U / ZUUUUj Uljj'iour . 


D "D 

r K 


1 R-Mav-?nnn • 90ootts-o° os°30P 


PR 


n q tt TM o rt n n • 9nnnTTc* 091 1 *3 1 sp 
(J y — J UJM ~z u u u > ZUUUU j UZllOl Jr . 


r K 


o c: TTTT — 9000 • 9 0 0 0T J^— 0 9 ? 0 S 3 4 P 


XX 




"D A 


( mt T T — ^ MTT.T.FNNTTTM PREDICTIVE MEDICINE INC. 


XX 




PI 


Lillie J, Xu Y, wang i, bieinmann rv, 


XX 




DR 


WPI; 2001-4oloob/ 4o . 


XX 




PT 


kt_ tt -r> Q -r>-h t ^ j=i neofnl a <; 3 marlfpr for t~ h dl acmOSlS of breast CSnCCI. 


XX 




PS 


Claim 1; rage zioo, ooyopp, tngiibn. 


AA 


tVi^ invpntinn relates to human breast cancer expressed polynucleotides 




CC 


/ a a t r\ 1 ^ /1 4 — a at 9 £"7 p q \ ^riH m^t'hnHc: of 7 accpcqi na whether a Datient is 




afflicted with breast cancer by examining the correlation between the 


CC 


expression of certain markers and the cancerous state of breast cells. 


CC 


The polynucleotides and encoded polypeptides are potential markers for 


CC 


detecting, diagnosing, monitoring, characterising treating and 


CC 


potentially preventing breast cancer. The polynucleotides and encoded 


CC 


polypeptides are also useful for isolating compounds with cytostatic 


CC 


activity 


XX 




SQ 


Sequence 397 BP; 137 A; 84 C; 84 G; 91 T; 0 U; 1 Other; 



Query Match 2.3%; Score 35.8; DB 4; Length 397; 

Best Local Similarity 62.5%; Pred. No. 2.5; 

Matches 55; Conservative 0; Mismatches 33; Indels 0; Gaps 



1451 CCTAGTACCAAAGTGAAATCTTGAGGAAAATCCCTGGAAAGAGTGGAAAGTCCTGCCTAA 1510 



Db 206 C T T AAT AC CT C CAG CAAC C AGT T GT GACAAT AC AT G CAAAGAGT G CAAAGT CT T GT C C AC 2 65 



Qy 1511 CACGTAAGTGCCTTCTTTGCTTGTTTGA 1538 

I I I I II I 1 I I 1 I I I I I I I I 
Db 266 GACGGATGTNCTTTTTTTTTTTTTTTGA 2 93 



RESULT 31 
AAH13437/c 

ID AAH13437 standard; cDNA; 493 BP. 
XX 

AC AAH13437; 
XX 

DT 26-JUN-2001 (first entry) 
XX 

DE Human cDNA clone (3' -primer) SEQ ID NO: 10272. 
XX 

KW Human; primer; detection; diagnosis; antisense therapy; gene therapy; ss. 
XX 

OS Homo sapiens. 
XX 

PN EP1074617-A2. 
XX 

PD 07-FEB-2001. 
XX 

PF 28-JUL-2000; 2000EP-0011612 6 . 
XX 

PR 29-JUL-1999; 99 JP-00248036 . 

PR 27-AUG-1999; 99 JP-00300253 . 

PR ll-JAN-2000; 2000JP-00118776 . 

PR 02-MAY-2000; 2 000 JP-0 018 37 67 . 

PR 09-JUN-2000; 2000 JP-00241899 . 
XX 

PA (HELI-) HELIX RES INST. 
XX 

PI Ota T, Isogai T, Nishikawa T, Hayashi K, Saito K, Yamamoto J; 

PI Ishii S, Sugiyama T, Wakamatsu A, Nagai K, Otsuki T; 

XX 

DR WPI; 2001-318749/34. 
XX 

PT Primer sets for synthesizing polynucleotides, particularly the 5602 full- 

PT length cDNAs defined in the specification, and for the detection and/or 

PT diagnosis of the abnormality of the proteins encoded by the full-length 

PT cDNAs . 
XX 

PS Claim 3; SEQ ID NO 10272; 2537pp + Sequence Listing; English. 
XX 

CC The present invention describes primer sets for synthesising 5602 full- 

CC length cDNAs defined in the specification. Where a primer set comprises: 

CC (a) an oligo-dT primer and an oligonucleotide complementary to the 

CC complementary strand of a polynucleotide which comprises one of the 5602 

CC nucleotide sequences defined in the specification, where the 

CC oligonucleotide comprises at least 15 nucleotides; or (b) a combination 

CC of an oligonucleotide comprising a sequence complementary to the 

CC complementary strand of a polynucleotide which comprises a 5 1 -end 

CC sequence and an oligonucleotide comprising a sequence complementary to a 

CC polynucleotide which comprises a 3 1 -end sequence, where the 



cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 

XX 
SQ 



oligonucleotide comprises at least 15 nucleotides and the combination of 
the 5 f -end sequence/3 r -end sequence is selected from those defined in the 
specification. The primer sets can be used in antisense therapy and in 
gene therapy. The primers are useful for synthesising polynucleotides, 
particularly full-length cDNAs . The primers are also useful for the 
detection and/or diagnosis of the abnormality of the proteins encoded by 
the full-length cDNAs . The primers allow obtaining of the full-length 
cDNAs easily without any specialised methods. AAH03166 to AAH13628 and 
AAH13633 to AAH18742 represent human cDNA sequences; AAB92446 to AAB95893 
represent human amino acid sequences; and AAH13629 to AAH13632 represent 
oligonucleotides, all of which are used in the exemplification of the 
present invention 

Sequence 493 BP; 112 A; 142 C; 138 G; 79 T; 0 U; 22 Other; 



Query Match 2.3%; Score 35.8; DB 4; Length 4 93; 

Best Local Similarity 57.5%; Pred. No. 2.8; 

Matches 61; Conservative 0; Mismatches 45; Indels 0; Gaps 



0; 



Qy 



Db 



7 8 0 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 8 3 9 

Ml II MM I I II 1 I I I II I M I M II I I M 

267 CTNAAGCCGGGGCTCCCCCTGCCTGCCTCTCTCTCCTCCTCCCCTNTGGGAATTGGGCAG 20 8 



Qy 



Db 



84 0 GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 8 85 

1 M II I I I I M I I II II II I I I I I I 
207 CCCTGGGCAGTTGTACTCATGGGGGCTTAANATGCAGCTACCTCAG 162 



RESULT 32 




ABN73379/c 




ID 


ABN73379 standard; cDNA; 639 BP. 




XX 






AC 


ABN73379; 




XX 






DT 


03-JUL-2002 (first entry) 




XX 






DE 


Bovine embryonic germ (EG) cell cDNA EST 000203a 


CONTIG 63. 


XX 






KW 


Bovine; Bos taurus; EST; expressed sequence tag; 


totipotence; 


KW 


development; gene; ss. 




XX 






OS 


Bos taurus . 




XX 






PN 


WO200194550-A2. 




XX 






PD 


13-DEC-2001. 




XX 






PF 


07-JUN-2001; 2001WO-US01857 6 . 




XX 






PR 


07-JUN-2000; 2000US-0209874P . 




PR 


06-JUN-2001; 2001US-00876143 . 




XX 






PA 


(INFI-) INFIGEN INC. 




XX 






PI 


Eilertsen KJ, Pf is ter-Genskow M, Childs L; 




XX 






DR 


WPI; 2002-351289/38. 





XX 

PT An expressed sequence tag (EST) , the expression of which, or its 

PT complementary sequence, in a cell identifies the cell as a 

PT developmentally competent or incompetent cell. 
XX 

PS Example 16; Page 161; 584pp; English. 
XX 

CC The present invention describes an expressed sequence tag (EST), where 

CC the EST is an isolated, enriched, or purified nucleic acid sequence 

CC representing all or part of a gene, the expression of which, or its 

CC complementary sequence, in a cell identifies the cell as a 

CC developmentally competent or incompetent cell. Molecules which induce 

CC developmental competence in a cell line are useful for inducing 

CC totipotence in one or more cells. Molecules which induce developmental 

CC incompetence in a cell line are useful for preventing a full term 

CC pregnancy in an animal and inhibiting totipotence. The molecules are also 

CC useful for treating a disease in an animal by inducing development of one 

CC or more cells of the animal into a specific cell type. The present 

CC sequence represents a bovine EST which is given in the exemplification of 

CC the present invention 

XX 

SQ Sequence 639 BP; 149 A; 184 C; 142 G; 156 T; 0 U; 8 Other; 

Query Match 2.3%; Score 35.8; DB 6; Length 639; 

Best Local Similarity 51.5%; Pred. No. 3.3; 

Matches 69; Conservative 4; Mismatches 61; Indels 0; Gaps 0; 

Qy 1185 GAAC AT CAAAT CAT GC C AGC AGAAGT G G GAC AGGCAAAT C CT CAAAGAT GT CT C C T T GT A 1244 

I I I : I : I I I I I I I II III: 1 I I I I II I I I I I 

Db 425 GAAVCTVCCGTCCTGCAAGTCAGAGTGGGACACACAAAGTCTGCTGTTTGTCGGCAGANC 366 

Qy 1245 CATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCS 1304 

II III I I I I I I I I I I I I I I I II I I 1 I I I I I I : : 

Db 365 CACTT CAAGTACGNAAATNAAGAGCAGCAT GAAGAGAT CT GGT GAAATT CT GGGGGGGAG 30 6 

Qy 1305 GGGGCTCCTGTACT 1318 

II III II 
Db 305 AAGGGAGCTGCTCT 292 



RESULT 33 




ABL79659 




ID 


ABL79659 standard; cDNA; 257 BP. 




XX 






AC 


ABL79659; 




XX 






DT 


17-MAY-2002 (first entry) 




XX 






DE 


Human ovarian cancer related cDNA clone SEQ ID NO: 


2637. 


XX 






KW 


Human; ovarian cancer; ovarian tumour; cytostatic; 


gene; ss 


XX 






OS 


Homo sapiens . 




XX 






PN 


WO200192581-A2. 




XX 






PD 


06-DEC-2001. 





XX 

PF 29-MAY-2001; 2 001WO-US017756 . 
XX 

PR 26-MAY-2000; 2 000US-02074 84P . 
XX 

PA (CORI-) CORIXA CORP. 
XX 

PI Algate PA, Harlocker SL, Jones R; 
XX 

DR WPI; 2002-122075/16. 
XX 

PT Composition for therapy and diagnosis of ovarian cancer comprising 

PT polypeptide of a ovarian tumor polypeptide, polynucleotide encoding 

PT polypeptide, antibody specific to polypeptide or T cell expressing 

PT polypeptide. 
XX 

PS Claim 1; SEQ ID NO 2637; 489pp; English. 
XX 

CC The present invention describes a composition (I) comprising: carriers 

CC and immunos timulants ; and a polypeptide (II) of a ovarian tumour 

CC polypeptide encoded by a polynucleotide (III) having a cDNA sequence (SI) 

CC from the 10912 nucleotide sequences as given in ABL77023 to ABL87934, 

CC (III) encoding (II) having a sequence (S2), a T cell population of (II), 

CC or antigen presenting cells that express (II) . (I) has cytostatic 

CC activity. An oligonucleotide (IV) that hybridises to (SI) can be used for 

CC detecting ovarian cancer in a patient's biological sample preferably 

CC serum or ovarian tissue. The method comprises contacting a biological 

CC sample from a patient with (IV), detecting the amount of polynucleotide 

CC hybridising to (IV) and comparing the amount to a predetermined cutoff 

CC value and thereby detecting ovarian cancer in the patient, where the 

CC amount of polynucleotide hybridising to (IV) is detected preferably by 

CC polymerase chain reaction (PCR) . (I) comprising (III) and/or (II) is 

CC useful for stimulating and/or expanding T cells specific for an ovarian 

CC tumour protein comprising contacting T cells with (III) or (II) . (Ill) is 

CC useful in design and preparation of ribozyme molecules for inhibiting 

CC expression of the tumour polypeptides and proteins in tumour cells; and 

CC to isolate a full length gene from a suitable library e.g., a tumour cDNA 

CC library using well known techniques 
XX 

SQ Sequence 257 BP; 28 A; 81 C; 83 G; 65 T; 0 U; 0 Other; 

Query Match 2.3%; Score 35.6; DB 6; Length 257; 
Best Local Similarity 58.5%; Pred. No. 2.2; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0 



780 



CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 839 



Db 



12 




840 



GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 88 5 



Db 



72 



CCCTGGGCAGTTGTACTCATGGGGGCTTAAGATGCAGCTACCTCAG 117 



RESULT 34 
AAQ60875/c 
ID AAQ60875 



standard; DNA; 313 BP. 



XX 

AC AAQ60875; 
XX 

DT 25-MAR-2003 (revised) 
DT 16-MAR-1994 (first entry) 
XX 

DE Human brain Expressed Sequence Tag EST00969. 
XX 

KW Gene transcription product; genetic markers; tagging; in vivo; 
KW transcription; mapping; locations; chromosomes; chromosomal; ss, 
XX 

OS Homo sapiens. 
XX 

PN W09316178-A2. 
XX 

PD 19-AUG-1993. 
XX 

PF 12-FEB-1993; 93WO-US0 012 94 . 
XX 

PR 12-FEB-1992; 92US-0 08 37 195 . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICE. 
XX 

PI Venter CJ, Adams MD, Moreno RF; 
XX 

DR WPI; 1993-272882/34. 
XX 
PT 
PT 
XX 

PS Example 4; Page 404; 500pp; English. 
XX 

CC The Expressed Sequence Tag was isolated from a human brain cDNA library 
CC as part of a large set of ESTs which can be used as markers for human 
CC genes transcribed in vivo. They can be used to facilitate tagging of most 
CC human genes, for mapping locations of expressed genes on chromosomes, for 
CC individual or forensic identification, for mapping locations of disease- 
CC associated genes, for identification of tissue type, and for prepn. of 
CC antisense sequences, probes and constructs. EST00969 has a "poor" coding 
CC probability as evaluated using the coding-region prediction program CRM. 
CC See also AAQ59041-Q6144 0 . (Updated on 25-MAR-2003 to correct PN field.) 
XX 

SQ Sequence 313 BP; 75 A; 106 C; 90 G; 41 T; 0 U; 1 Other; 

Query Match 2.3%; Score 35.6; DB 2; Length 313; 

Best Local Similarity 58.5%; Pred. No. 2.5; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0; 

Qy 7 80 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 839 

I I I I || I I I I I I I I 1 I I 1 I I I I I I I I I I I M I I I I I I 
Db 2 50 CTGAAGCCGGGGCTCCCCCTGCCTGCCTCTCTCTCCTCCTCCCCTCTGGGAATTGGGCAG 191 

Qy 84 0 GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 8 85 

I M I I I I I I I I I II I I I I I I I I I I I 
D b 190 CCCTGGGCAGTTGTACTCATGGGGGCTTAAGATGCAGCTACCTCAG 145 



Enriched oligonucleotides and corresp. sequences - used as markers for 
human genes transcribed in-vivo, facilitate tagging of most human genes. 



Gene; liver cancer; ds; hepatocellular carcinoma; hepatotropic; 
metastatic liver tumour; cytostatic; expression profile; disease state; 
disease progression; drug toxicity; drug efficacy; drug metabolism. 



RESULT 35 
ABN94395/c 

ID ABN94395 standard; DNA; 330 BP. 
XX 

AC ABN94395; 
XX 

DT 13-AUG-2002 (first entry) 
XX 

DE Gene #893 used to diagnose liver cancer. 
XX 
KW 
KW 
KW 
XX 

OS Homo sapiens. 
XX 

PN WO200229103-A2 . 
XX 

PD ll-APR-2002. 
XX 

PF 02-OCT-2001; 2001WO-US030589 . 
XX 

PR 02-OCT-2000; 2 00 OUS-0237 054P . 
XX 

PA (GENE-) GENE LOGIC INC. 
XX 

PI Home D, Alvares C, Peres-Da-Silva S, Vockley JG; 
XX 

DR WPI; 2002-426119/45. 
XX 

PT Diagnosing and detecting the progression of liver cancer, hepatocellular 
PT carcinoma or metastatic liver tumor in a patient, involves detecting the 
PT level of expression of two or more genes in a liver tissue sample. 
XX 

PS Claim 1; SEQ ID NO 893; 298pp; English. 
XX 

CC The invention relates to a novel method for diagnosing and detecting the 
CC progression of liver cancer, hepatocellular carcinoma or metastatic liver 
CC tumour in a patient, and differentiating metastatic liver cancer from 
CC hepatocellular carcinoma in a patient, involving detecting the level of 
CC expression of two or more genes represented in ABN93503-ABN97455 in a 
CC tissue sample. The method of the invention has hepatotropic, and 
CC cytostatic activity. The method is useful for diagnosing and detecting 
CC the progression of liver cancer, hepatocellular carcinoma and metastatic 
CC liver carcinoma in a patient. The method is useful for identifying 
CC expression profiles which serve as useful diagnostic markers as well as 
CC markers that can be used to monitor disease states, disease progression, 
CC drug toxicity, drug efficacy and drug metabolism. Note: The sequence data 
CC for this patent did not form part of the printed specification, but was 
CC obtained in electronic format directly from WIPO at 
CC ftp . wipo . int/pub/published_pct_sequences 
XX 

SQ Sequence 330 BP; 77 A; 104 C; 88 G; 61 T; 0 U; 0 Other; 

Query Match 2.3%; Score 35.6; DB 6; Length 330; 

Best Local Similarity 58.5%; Pred. No. 2.6; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0 



78 0 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 839 

MM || I I I II I I I I I I I I I I I I 1 M I I I I I I I I I I I 
2 82 CTGAAGCCGGGGCTCCCCCTGCCTGCCTCTCTCTCCTCCTCACCTCTGGGAATTGGGCAG 223 

8 40 GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 8 85 

I M I I I I I I I I I I I I II Ml I I I I I 
222 CCCTGGGCAGTTGTACTCATGGGGGCTTAAGATGCAGCTACCTCAG 177 



RESULT 3 6 
ABL87579/c 

ID ABL87579 standard; cDNA; 440 BP. 
XX 

AC ABL87579; 
XX 

DT 17-MAY-2002 (first entry) 
XX 

DE Human ovarian cancer related cDNA clone SEQ ID NO: 10557. 
XX 

KW Human; ovarian cancer; ovarian tumour; cytostatic; gene; ss. 
XX 

OS Homo sapiens. 
XX 

PN WO200192581-A2. 
XX 

PD 06-DEC-2001. 
XX 

PF 29-MAY-2001; 2 0 0 1WO-US0177 56 . 
XX 

PR 26-MAY-2000; 2 0 00US-02 07 4 8 4P . 
XX 

PA (CORI-) CORIXA CORP. 
XX 

PI Algate PA, Harlocker SL, Jones R; 
XX 

DR WPI; 2002-122075/16. 
XX 

PT Composition for therapy and diagnosis of ovarian cancer comprising 

PT polypeptide of a ovarian tumor polypeptide, polynucleotide encoding 

PT polypeptide, antibody specific to polypeptide or T cell expressing 

PT polypeptide. 
XX 

PS Claim 1; SEQ ID NO 10557; 489pp; English. 
XX 

CC The present invention describes a composition (I) comprising: carriers 

CC and immunos timulants ; and a polypeptide (II) of a ovarian tumour 

CC polypeptide encoded by a polynucleotide (III) having a cDNA sequence (SI) 

CC from the 10912 nucleotide sequences as given in ABL77023 to ABL87934, 

CC (III) encoding (II) having a sequence (S2), a T cell population of (II), 

CC or antigen presenting cells that express (II) . (I) has cytostatic 

CC activity. An oligonucleotide (IV) that hybridises to (SI) can be used for 

CC detecting ovarian cancer in a patient's biological sample preferably 

CC serum or ovarian tissue. The method comprises contacting a biological 

CC sample from a patient with (IV), detecting the amount of polynucleotide 

CC hybridising to (IV) and comparing the amount to a predetermined cutoff 

CC value and thereby detecting ovarian cancer in the patient, where the 



CC amount of polynucleotide hybridising to (IV) is detected preferably by 

CC polymerase chain reaction (PCR) . (I) comprising (III) and/or (II) is 

CC useful for stimulating and/or expanding T cells specific for an ovarian 

CC tumour protein comprising contacting T cells with (III) or (II). (Ill) is 

CC useful in design and preparation of ribozyme molecules for inhibiting 

CC expression of the tumour polypeptides and proteins in tumour cells; and 

CC to isolate a full length gene from a suitable library e.g., a tumour cDNA 

CC library using well known techniques 
XX 

SQ Sequence 440 BP; 97 A; 154 C; 131 G; 56 T; 0 U; 2 Other; 

Query Match 2.3%; Score 35.6; DB 6; Length 440; 

Best Local Similarity 58.5%; Pred. No. 3.1; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0; 

Qy 7 80 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 8 39 

I I I I II I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I 
Db 261 CTGAAGCCGGGGCTCCCCCTGCCTGCCTCTCTCTCCTCCTCCCCTCTGGGAATTGGGCAG 2 02 

Qy 840 GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 8 85 

I II I I I I I I I I I II I I I I II I I I I I 
Db 201 CCCTGGGCAGTTGTACTCATGGGGGCTTAAGATGCAGCTACCTCAG 156 



RESULT 37 
AAC76886/c 

ID AAC76886 standard; cDNA; 1333 BP. 
XX 

AC AAC768 86; 
XX 

DT 08-FEB-2001 (first entry) 
XX 

DE Human ORFX ORF2441 polynucleotide sequence SEQ ID NO: 4881. 
XX 

KW Human; open reading frame; ORFX; detection; cytostatic; hepatotropic; 

KW vulnerary; antipsoriatic; antiparkinsonian; nootropic; neuroprotective; 

KW anticonvulsant; osteopathic; antiarthritic; immunosuppressant; cardiant; 

KW immunostimulant; thrombolytic; coagulant; vasotropic; antidiabetic; 

KW hypotensive; dermatological ; immunosuppressive; antiinflammatory; 

KW antiviral; antibacterial; antifungal; antirheumatic; antithyroid; 

KW antianaemic; gene therapy; cancer; proliferative disorder; hypertension; 

KW neurodegenerative disorder; osteoarthritis; graft vs host disease; 

KW cardiovascular disease; diabetes mellitus; hypothyroidism; SCID; AIDS; 

KW cholesterol ester storage; systemic lupus erythematosus; infection; 

KW severe combined immunodeficiency; malaria; autoimmune disorder; asthma; 

KW allergy; aplastic anaemia; nocturnal haemoglobinuria; burn; wound; 

KW bone damage; cartilage damage; antiinflammatory disease; coagulation; 

KW thrombosis; contraceptive; ss. 

XX 

OS Homo sapiens. 
XX 

PN WO200058473-A2. 
XX 

PD 05-OCT-2000. 
XX 

PF 31-MAR-2000; 2000WO-US00862 1 . 
XX 



PR 31-MAR-1999; 99US-0127 607P . 

PR 02-APR-1999; 99US-0127 636P . 

PR 05-APR-1999; 99US-0127728P . 

PR 30-MAR-2000; 2 000US-00540763 . 
XX 

PA (CURA-) CURAGEN CORP. 
XX 

PI Shimkets RA, Leach M; 
XX 

DR WPI; 2000-602362/57. 

DR P-PSDB; AAB42677. 
XX 

PT Novel nucleic acids and peptides derived from open reading frame X, 

PT useful for treating e.g. cancers, proliferative disorders, 

PT neurodegenerative disorders and cardiovascular disease. 
XX 

PS Claim 5; Page 4060-4061; 5507pp; English. 
XX 

CC AAC74446 to AAC77606 encode the proteins given in AAB40237 to AAB43397, 

CC which represent the human ORFX open reading frames 1 to 3161. The ORFX 

CC sequences have activities such as: cytostatic; hepatotropic; vulnerary; 

CC antipsoriatic; antiparkinsonian; nootropic; neuroprotective; osteopathic; 

CC anticonvulsant; antiarthritic; immunosuppressant; immuno stimulant; 

CC cardiant; thrombolytic; coagulant; vasotropic; antidiabetic; hypotensive; 

CC dermatological ; immunosuppressive ; antiinflammatory; antibacterial ; 

CC antiviral; antifungal; antirheumatic; antithyroid; and antianaemic. The 

CC sequences can be used for determining the presence of or predisposition 

CC to, or preventing or treating pathological conditions associated with an 

CC ORFX-associated disorder. The nucleic acids can be used to express ORFX 

CC proteins in gene therapy vectors. The proteins and nucleic acids may be 

CC used to treat cancers, proliferative disorders, neurodegenerative 

CC disorders, osteoarthritis, graft vs host disease, cardiovascular disease, 

CC diabetes mellitus, hypertension, hypothyroidism, cholesterol ester 

CC storage, systemic lupus erythematosus, severe combined immunodeficiency 

CC (SCID), AIDS, viral, bacterial or fungal infection, malaria, autoimmune 

CC disorders, asthma, allergies, aplastic anaemia, burns, wounds, bone and 

CC cartilage damage, nocturnal haemoglobinuria, antiinflammatory disease; to 

CC enhance coagulation; to inhibit thrombosis; and as a contraceptive 

XX 

SQ Sequence 1333 BP; 268 A; 380 C; 396 G; 286 T; 0 U; 3 Other; 

Query Match 2.3%; Score 35.6; DB 3; Length 1333; 

Best Local Similarity 46.7%; Pred. No. 5.8; 



Matches 


113; Conservative 0; Mismatches 12 9; Indels 0; Gaps 


0; 


Qy 


646 


GGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTC 

M 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 Ml M 1 
GGCAGGCTGTTCTCTGGTTCCAACTACTTGCCCACAGGATCTCTAAAGACCCAGGAATGG 


705 


Db 


1285 


1226 


Qy 


706 


CTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCG 

II II 1 II 1 1 1 1 1 1 1 1 1 MM II III 1 
GGGCTATTGCCAGGGGTTAGAAGAGAACCAGGTCCCAAGGGCATGGTGGGCGGGCAGATG 


765 


Db 


1225 


1166 


Qy 


766 


CTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTG 

II 1 1 III 1 II 1 1 II IMM II M 1 1 1 M 1 
GTTCCAGAGCCTTAGAGATTCATAGGTTCTTCCTCCTCCACCAGCTGCTCCGAGGGCCTG 


825 


Db 


1165 


1106 



Qv 8 26 T GTAGAT GGAGAAGGCT CGGAGAGT GGGGGT GCTGGGGGCACAAAAT GGAAT GAACACT G 8 85 

|| M II I I M I I I I I I I I I I I I Ml I 

Db 1105 TGGGGAGGGACAAGGGTGGGATGCTGGAGCACCAGGGCTGCAGCAAGGGCCTTAGCTAAG 104 6 



Qy 886 CT 887 

I I 

Db 1045 CT 1044 



RESULT 38 
AAH18291 

ID AAH18291 standard; cDNA; 2474 BP. 
XX 

AC AAH18291; 
XX 

DT 26-JUN-2001 (first entry) 
XX 

DE Human cDNA sequence SEQ ID NO: 18274. 
XX 

KW Human; primer; detection; diagnosis; antisense therapy; gene therapy; ss. 
XX 

OS Homo sapiens. 
XX 

PN EP1074617-A2. 
XX 

PD 07-FEB-2001. 
XX 

PF 28-JUL-2000; 2 000EP- 0011612 6 . 
XX 

PR 29-JUL-1999; 99 JP-00248036 . 

PR 27-AUG-1999; 99 JP- 0030 0253 . 

PR ll-JAN-2000; 2000 JP-0011877 6 . 

PR 02-MAY-2000; 2 000 JP-001837 67 . 

PR 09-JUN-2000; 2000 JP-0024 1899 . 
XX 

PA (HELI-) HELIX RES INST. 
XX 

PI Ota T, Isogai T, Nishikawa T, Hayashi K, Saito K, Yamamoto J; 

PI Ishii S, Sugiyama T, Wakamatsu A, Nagai K, Otsuki T; 

XX 

DR WPI; 2001-318749/34. 
XX 

PT Primer sets for synthesizing polynucleotides, particularly the 5602 full- 

PT length cDNAs defined in the specification, and for the detection and/or 

PT diagnosis of the abnormality of the proteins encoded by the full-length 

PT cDNAs . 
XX 

PS Claim 8; SEQ ID NO 18274; 2537pp + Sequence Listing; English. 
XX 

CC The present invention describes primer sets for synthesising 5602 full- 

CC length cDNAs defined in the specification. Where a primer set comprises: 

CC (a) an oligo-dT primer and an oligonucleotide complementary to the 

CC complementary strand of a polynucleotide which comprises one of the 5602 

CC nucleotide sequences defined in the specification, where the 

CC oligonucleotide comprises at least 15 nucleotides; or (b) a combination 

CC of an oligonucleotide comprising a sequence complementary to the 

CC complementary strand of a polynucleotide which comprises a 5' -end 



CC sequence and an oligonucleotide comprising a sequence complementary to a 

CC polynucleotide which comprises a 3' -end sequence, where the 

CC oligonucleotide comprises at least 15 nucleotides and the combination of 

CC the 5 '-end sequence/3 1 -end sequence is selected from those defined in the 

CC specification. The primer sets can be used in antisense therapy and in 

CC gene therapy. The primers are useful for synthesising polynucleotides, 

CC particularly full-length cDNAs . The primers are also useful for the 

CC detection and/or diagnosis of the abnormality of the proteins encoded by 

CC the full-length cDNAs . The primers allow obtaining of the full-length 

CC cDNAs easily without any specialised methods. AAH03166 to AAH13628 and 

CC AAH13633 to AAH18742 represent human cDNA sequences; AAB92446 to AAB95893 

CC represent human amino acid sequences; and AAH13629 to AAH13632 represent 

CC oligonucleotides, all of which are used in the exemplification of the 

CC present invention 

XX 

SQ Sequence 2474 BP; 468 A; 861 C; 716 G; 429 T; 0 U; 0 Other; 

Query Match 2.3%; Score 35.6; DB 4; Length 2474; 

Best Local Similarity 58.5%; Pred. No. 8.4; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0; 



Qy 


780 


CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 

1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mill 1 1 1 1 1 1 
CTGAAGCCGGGGCTCCCCCTGCCTGCCTCTCTCTCCTCCTCCCCTCTGGGAATTGGGCAG 


839 


Db 


2208 


2267 


Qy 


840 


GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 88 5 

I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
CCCT GGGCAGTT GTACT CAT GGGGGCTTAAGAT GCAGCTACCTCAG 2313 




Db 


2268 





RESULT 39 


AAS42019 


ID 


AAS42019 standard; DNA; 21632 BP. 


XX 




AC 


AAS42019; 


XX 




DT 


17-DEC-2001 (first entry) 


XX 




DE 


Genomic sequence #335 encoding novel human enzyme polypeptide. 


XX 




KW 


Human; oxidoreductase enzyme; transferase; hydrolase; lyase; isomerase; 


KW 


ligase; hyperprolif erative disorder; immunodeficiency disorder; 


KW 


autoimmune disorder; neurological disorder; metabolic disorder; 


KW 


inflammatory disorder; cardiovascular disorder; reproductive disorder; 


KW 


blood-related disorder; infectious disorder; gene therapy; cytostatic; 


KW 


anti arthritic; nephrotropic; anticoagulant; ds . 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200155301-A2. 


XX 




PD 


02-AUG-2001. 


XX 




PF 


17-JAN-2001; 2 001WO-US001239 . 


XX 




PR 


31-JAN-2000; 2000US-0179065P . 


PR 


04-FEB-2000; 2000US-0180628P . 
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2000US-0226279P. 

2000US-0226681P. 

2000US-0226868P. 

2000US-0227182P. 

2000US-0227009P. 

2000US-0228924P. 

2000US-0229287P. 

2000US-0229343P. 

2000US-0229344P. 

2000US-0229345P. 

2000US-0229509P. 

2000US-0229513P. 

2000US-0230437P. 

2000US-0230438P. 

2000US-0231242P. 

2000US-0231243P. 

2000US-0231244P. 

2000US-0231413P 

2000US-0231414P 

2000US-0232080P 

2000US-0232081P 

2000US-0231968P 

2000US-0232397P 

2000US-0232398P 

2000US-0232399P 

2000US-0232400P 

2000US-0232401P 

2000US-0233063P 

2000US-0233064P 



PR 14-SEP-2000; 

PR 21-SEP-2000; 

PR 21-SEP-2000; 

PR 25-SEP-2000; 

PR 25-SEP-2000; 

PR 26-SEP-2000; 

PR 27-SEP-2000; 

PR 27-SEP-2000; 

PR 29-SEP-2000; 

PR 29-SEP-2000, 

PR 29-SEP-2000, 

PR 29-SEP-2000, 

PR 29-SEP-2000, 

PR 02-OCT-2000, 

PR 02-OCT-2000, 

PR 02-OCT-2000, 

PR 02-OCT-2000, 

PR 02-OCT-2000, 

PR 13-OCT-2000, 

PR 13-OCT-2000, 

PR 20-OCT-2000, 

PR 20-OCT-2000; 

PR 20-OCT-2000, 

PR 20-OCT-2000, 

PR 20-OCT-2000, 

PR 20-OCT-2000, 

PR 20-OCT-2000, 

PR 20-OCT-2000, 

PR 01-NOV-2000, 

PR 08-NOV-2000, 

PR 08-NOV-2000, 

PR 08-NOV-2000, 

PR 08-NOV-2000, 

PR 08-NOV-2000, 

PR 08-NOV-2000, 

PR 08-NOV-2000 

PR 08-NOV-2000 

PR 08-NOV-2000 

PR 08-NOV-2000 

PR 08-NOV-2000 

PR 08-NOV-2000 

PR 08-NOV-2000 

PR 08-NOV-2 000 

PR 08-NOV-2000 

PR 08-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 

PR 17-NOV-2000 



2000US-0233065P. 

2000US-0234223P. 

2000US-0234274P. 

2000US-0234997P. 

2000US-0234998P. 

2000US-0235484P. 

2000US-0235834P. 

2000US-0235836P. 

2000US-0236327P. 

2000US-0236367P. 

2000US-0236368P. 

2000US-0236369P. 

2000US-0236370P. 

2000US-0236802P. 

2000US-0237037P. 

2000US-0237038P. 

2000US-0237039P. 

2000US-0237040P. 

2000US-0239935P. 

2000US-0239937P. 

2000US-0240960P. 

2000US-0241221P. 

2000US-0241785P. 

2000US-0241786P. 

2000US-0241787P. 

2000US-0241808P. 

2000US-0241809P. 

2000US-0241826P. 

2000US-0244617P. 

2000US-0246474P. 

2000US-0246475P. 

2000US-0246476P. 

2000US-0246477P. 

2000US-0246478P. 

2000US-0246523P. 

2000US-0246524P. 

2000US-0246525P. 

2000US-0246526P. 

2000US-0246527P. 

2000US-0246528P. 

2000US-0246532P. 

2000US-0246609P. 

2000US-0246610P 

2000US-0246611P 

2000US-0246613P 

2000US-0249207P 

2000US-0249208P 

2000US-0249209P 

2000US-0249210P 

2000US-0249211P 

2000US-0249212P 

2000US-0249213P 

2000US-0249214P 

2000US-0249215P 

2000US-0249216P 

2000US-0249217P 

2000US-0249218P 



PR 17-NOV-2000; 2000US-024 9244P . 

PR 17-NOV-2000; 2000US-0249245P . 

PR 17-NOV-2000; 2000US-0249264P . 

PR 17-NOV-2000; 2000US-024 9265P . 

PR 17-NOV-2000; 2000US-024 9297P . 

PR 17-NOV-2000; 2000US-02492 99P . 

PR 17-NOV-2000; 2 000US-024 9300P . 

PR 01-DEC-2000; 2000US-0250160P . 

PR 01-DEC-2000; 2000US-0250391P . 

PR 05-DEC-2000; 2 00 OUS-025 1030P . 

PR 05-DEC-2000; 2000US-025198 8P . 

PR 05-DEC-2000; 2000US-0256719P . 

PR 06-DEC-2000; 2000US- 025 14 7 9P . 

PR 08-DEC-2000; 2 OOOUS-0251 85 6P . 

PR 08-DEC-2000; 2000US-0251868P . 

PR 08-DEC-2000; 2000US-02518 69P . 

PR 08-DEC-2000; 2000US-0251989P . 

PR 08-DEC-2000; 2 0 00US- 0251 990P . 

PR ll-DEC-2000; 2 000US-0254 097P . 

PR 05-JAN-2001; 2001US-025967 8P . 
XX 

PA (HUMA-) HUMAN GENOME SCI INC. 
XX 

PI Rosen CA, Barash SC, Ruben SM; 
XX 

DR WPI; 2001-465566/50. 
XX 

PT Novel polypeptides and polynucleotides useful for diagnosing, preventing, 

PT treating neural, immune system, muscular, reproductive, pulmonary, 

PT cardiovascular, renal, proliferative disorders and cancerous diseases. 

XX 

PS Disclosure; SEQ ID NO 2145; 1180pp; English. 
XX 

CC The present invention relates to the isolation of novel human enzyme 

CC polypeptides (AAU22915-AAU23814 ) , and the cDNA and genomic sequences 

CC encoding them. The enzyme polypeptides of the invention may comprise the 

CC functional classes of oxidoreductases , transferases, hydrolases, lyases, 

CC isomerases or ligases . The sequences of the invention are useful in the 

CC diagnosis, treatment, prevention and/or prognosis of a wide range of 

CC disorders including hyperprolif erative disorders (e.g. cancer), 

CC immunodeficiency disorders (e.g. AIDS) autoimmune disorders (e.g. 

CC arthritis), neurological disorders (e.g. Alzheimer 1 s disease), metabolic 

CC disorders (e.g. phenylketonuria), inflammatory disorders (e.g. asthma), 

CC cardiovascular disorders (e.g. atherosclerosis), blood-related disorders 

CC (e.g. haemophilia), reproductive disorders (e.g. infertility) and 

CC infectious disorders (e.g. Influenza). The polynucleotides of the 

CC invention can also be used in gene therapy. AAS41685-AAS42192 represent 

CC DNA sequences encoding for the novel human enzyme polypeptides of the 

CC invention. Note: The sequence data for this patent did not form part of 

CC the printed specification, but was obtained in electronic format directly 

CC from WIPO at ftp.wipo.int/pub/published_pct_sequences 

XX 

SQ Sequence 21632 BP; 4707 A; 6164 C; 6039 G; 4722 T; 0 U; 0 Other; 

Query Match 2.3%; Score 35.6; DB 4; Length 21632; 

Best Local Similarity 46.7%; Pred. No. 30; 

Matches 113; Conservative 0; Mismatches 129; Indels 0; Gaps 0 



Qy 64 6 GGT T GT CT GT C CAGCAGAT CAGGGT GAAAGT GGACAGT CT GTAACAAC AGT GAGT CGTT C 705 

I I I I I I I I III I I I I I I I I III II I 

Db 10501 GGCAGGCTGTTCTCTGGTTCCAACTACTTGCCCACAGGATCTCTJ\AAGACCCAGGAATGG 10560 

Qy 706 CTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCG 765 

I I II I II I I I I I I I I I I I I I II III I 

Db 10561 GGGCTATTGCCAGGGGTTAGAAGAGAACCAGGTCCCAAGGGCATGGTGGGCGGGCAGATG 10620 

Qy 766 CTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTG 825 

I I I I III I I II I II I I I I I II II I I II I I 

Db 10621 GTTCCAGAGCCTTAGAGATTCATAGGTTCTTCCTCCTCCACCAGCTGCTCCGAGGGCCTG 10680 

Qy 82 6 T GT AGAT G GAGAAGG CT C G GAGAGT GGGGGTGCTGGGGG C ACAAAAT GGAAT GAAC AC T G 8 85 

II II I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 10681 TGGGGAGGGACAAGGGTGGGATGCTGGAGCACCAGGGCTGCAGCAAGGGCCTTAGCTAAG 10740 

Qy 886 CT 887 

I I 

Db 10741 CT 10742 



RESULT 4 0 


AAC89560/c 


ID 


AALctyobU standard; JJJMA, izzioo or . 


XX 




AC 


aal o y o d u ; 


vv 

AA 




DT 


08-MAR-2001 (first entry) 


XX 




DE 


Human histone deacetylase HDAC-D coding sequence. 


XX 




KW 


Histone deacetylase; HDAC-1; HDAC-2; HDAC-3; HDAC-4; HDAC-5; HDAC-C; 


KW 


HDAC-D; cell cycle; tumourigenesis ; cancer; inhibitor; antisense; 


KW 


gene therapy; ds . 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200071703-A2. 


XX 




PD 


30-NOV-2000. 


XX 




PF 


03-MAY-2000; 2 00 0WO-IB00 12 52 . 


XX 




PR 


03-MAY-1999; 99US- 01322 87P . 


XX 




PA 


(METH-) METHYLGENE INC. 


XX 




PI 


Macleod AR r Li Z, Besterman JM; 


XX 




DR 


WPI; 2001-016407/02. 


XX 




PT 


Antisense oligonucleotide that inhibits expression of a histone 


PT 


deacetylase, useful for treating and/or alleviating the symptoms of 


PT 


neoplasia, or for inhibiting neoplastic cell growth in an animal. 


XX 




PS 


Disclosure; Page 89-125; 125pp; English. 



XX 

CC The present invention provides inhibitors of histone deacetylase enzymes 

CC such as HDAC-1, HDAC-2, HDAC-3, HDAC-4, HDAC-5, HDAC-C and HDAC-D. These 

CC inhibitors may be antisense strands or they may be compounds identified 

CC by contacting the enzyme with the compound and measuring the resulting 

CC enzyme activity. These inhibitors are useful for treating cancers and for 

CC identifying which histone deacetylase is involved in a neoplasia 
XX 

SQ Sequence 122186 BP; 29016 A; 31077 C; 32425 G; 29668 T; 0 U; 0 Other; 



Query Match 2.3%; Score 35.6; DB 4; Length 122186; 

Best Local Similarity 46.7%; Pred. No. 81; 

Matches 113; Conservative 0; Mismatches 12 9; Indels 0; Gaps 0; 

Qy 646 GGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTC 705 

I I I I I I I I III I I MM II III II I 
Db 107424 GGCAGGCTGTTCTCTGGTTCCAACTACTTGCCCACAGGATCTCTAAAGACCCAGGAATGG 
107365 

Qy 706 CTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCG 765 

II II I II I II I I I I I I II II M III I 

Db 107364 GGGCTATTGCCAGGGGTTAGAAGAGAACCAGGTCCCAAGGGCATGGTGGGCGGGCAGATG 

107305 

Qy 7 66 CTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTG 82 5 

MM III I I I I I II I I I I I II II I I I I I I 

Db 1073 04 GTTCCAGAGCCTTAGAGATTCATAGGTTCTTCCTCCTCCACCAGCTGCTCCGAGGGCCTG 

107245 

Qy 82 6 T GT AGAT G GAGAAG G C T C G G AGAGT G GGG GT G CT GGGG GC ACAAAAT G GAAT GAAC AC T G 8 85 

II II I II I I II I I I I I I I I I I I I II II III I 
Db 107244 TGGGGAGGGACAAGGGTGGGATGCTGGAGCACCAGGGCTGCAGCAAGGGCCTTAGCTAAG 
107185 

Qy 886 CT 887 

I I 

Db 107184 CT 107183 



RESULT 41 
ABN73289/c 

ID ABN73289 standard; cDNA; 639 BP. 
XX 

AC ABN73289; 
XX 

DT 03-JUL-2002 (first entry) 
XX 

DE Bovine embryonic germ (EG) cell cDNA EST 000203a CONTIG 63. 
XX 

KW Bovine; Bos taurus; EST; expressed sequence tag; totipotence; 

KW development; gene; ss. 

XX 

OS Bos taurus . 
XX 

PN WO200194550-A2. 
XX 

PD 13-DEC-2001. 



XX 

PF 07-JUN-2001; 2001WO-US018576. 
XX 

PR 07-JUN-2000; 2000US-0209874P . 

PR 06-JUN-2001; 2001US-00876143 . 
XX 

PA (INFI-) INFIGEN INC. 
XX 

PI Eilertsen KJ, Pf ister-Genskow M, Childs L; 
XX 

DR WPI; 2002-351289/38. 
XX 

PT An expressed sequence tag (EST) , the expression of which, or its 

PT complementary sequence, in a cell identifies the cell as a 

PT developmentally competent or incompetent cell. 
XX 

PS Example 16; Page 143-144; 584pp; English. 
XX 

CC The present invention describes an expressed sequence tag (EST), where 

CC the EST is an isolated, enriched, or purified nucleic acid sequence 

CC representing all or part of a gene, the expression of which, or its 

CC complementary sequence, in a cell identifies the cell as a 

CC developmentally competent or incompetent cell. Molecules which induce 

CC developmental competence in a cell line are useful for inducing 

CC totipotence in one or more cells. Molecules which induce developmental 

CC incompetence in a cell line are useful for preventing a full term 

CC pregnancy in an animal and inhibiting totipotence. The molecules are also 

CC useful for treating a disease in an animal by inducing development of one 

CC or more cells of the animal into a specific cell type. The present 

CC sequence represents a bovine EST which is given in the exemplification of 

CC the present invention 

XX 

SQ Sequence 639 BP; 149 A; 184 C; 142 G; 156 T; 0 U; 8 Other; 

Query Match 2.3%; Score 35.4; DB 6; Length 639; 
Best Local Similarity 51.5%; Pred. No. 4.4; 

Matches 69; Conservative 2; Mismatches 63; Indels 0; Gaps 0; 

Qy 1185 GAAC AT CAAAT CAT GC CAG C AGAAGT G GGAC AGGCAAAT C CT CAAAGAT GTCTCCTT GT A 1244 

I I I I I I I I I I I II II I I I I I I I II II I I I I I 

Db 425 GAANCTNCCGTCCTGCAAGTCAGAGTGGGACACACAAAGTCTGCTGTTTGTCGGCAGANC 366 

Qy 1245 CAT C GAGAGT G G C C AGAT TAT GT G CAT C TT AG G C AGCT CAGGT AAGT GCCTGGGGGGSCS 1304 

II IN I I I I I I I I I I I I I I I I I I I I I I I I I I : : 

Db 365 C AC T T CAAGT AC GNAAAT NAAGAGC AG CAT GAAGAGAT CT G GT GAAAT T CT G GGGGGGAG 306 

Qy 1305 GGGGCTCCTGTACT 1318 

II III II 

Db 305 AAGGGAGCTGCTCT 2 92 



RESULT 4 2 
AAL18444 

ID AAL18444 standard; cDNA; 414 BP. 
XX 

AC AAL18 44 4; 
XX 



DT 07-DEC-2001 (first entry) 
XX 

DE Human breast cancer expressed polynucleotide 10901. 
XX 

KW Human; breast cancer; cell marker; cytostatic; ss . 
XX 

OS Homo sapiens. 
XX 

PN WO200151628-A2. 
XX 

PD 19-JUL-2001. 
XX 

PF 10-JAN-2001; 2001WO-US000798 . 
XX 

PR 14-JAN-2000; 

PR 14-MAR-2000; 

PR 24-MAR-2000; 

PR 29-MAR-2000; 

PR 15-MAY-2000; 

PR 09-JUN-2000; 

PR 25-JUL-2 000; 
XX 

PA (MILL-) MILLENNIUM PREDICTIVE MEDICINE INC. 
XX 

PI Lillie J, Xu Y, Wang Y, Steinmann K; 
XX 

DR WPI; 2001-451856/48. 
XX 

PT New peptide useful as a marker for the diagnosis of breast cancer. 
XX 

PS Claim 1; Page 1940-1941; 3695pp; English. 
XX 

CC The invention relates to human breast cancer expressed polynucleotides 

CC (AAL07544-AAL26789) and methods of assessing whether a patient is 

CC afflicted with breast cancer by examining the correlation between the 

CC expression of certain markers and the cancerous state of breast cells. 

CC The polynucleotides and encoded polypeptides are potential markers for 

CC detecting, diagnosing, monitoring, characterising treating and 

CC potentially preventing breast cancer. The polynucleotides and encoded 

CC polypeptides are also useful for isolating compounds with cytostatic 

CC activity 

XX 

SQ Sequence 414 BP; 144 A; 86 C; 85 G; 99 T; 0 U; 0 Other; 

Query Match 2.2%; Score 35.2; DB 4 ; Length 414; 
Best Local Similarity 62.5%; Pred. No. 3.9; 

Matches 55; Conservative 0; Mismatches 33; Indels 0; Gaps 0; 

Qy 1451 CCTAGTACCAAAGTGAAATCTTGAGGAAAATCCCTGGAAAGAGTGGAAAGTCCTGCCTAA 1510 

| | | I I I I II I I I I I I I I I I I I II I I I I I I M I I I I I 

D b 207 CTTAATACCTCCAGCAACCAGTTGTGACAATACATGCAAAGAGTGCAAAGTCTTGTCCAC 266 



2000US-0176077P. 
2000US-0189167P. 
2000US-0192099P. 
2000US-0193480P. 
2000US-0205230P. 
2000US-0211315P. 
2000US-0220534P. 



1511 CACGTAAGTGCCTTCTTTGCTTGTTTGA 1538 

I I I I I I I I I I II I I I I I I I 
267 GACGGATGTTCTTTTTTTTTTTTTTTGA 2 94 



RESULT 43 
AAL10158 

ID AAL10158 standard; cDNA; 416 BP. 
XX 

AC AAL10158; 
XX 

DT 07-DEC-2001 (first entry) 
XX 

DE Human breast cancer expressed polynucleotide 2615. 
XX 

KW Human; breast cancer; cell marker; cytostatic; ss . 
XX 

OS Homo sapiens ♦ 
XX 

PN WO200151628-A2. 
XX 

PD 19-JUL-2001. 
XX 

PF 10-JAN-2001; 2 001WO-US 0007 9 8 . 
XX 

PR 14-JAN-2000; 2000US-0176077P . 

PR 14-MAR-2000; 2 000US-01 8 91 67 P . 

PR 24-MAR-2000; 2 000US-0192 0 99P . 

PR 29-MAR-2000; 2000US-01934 80P . 

PR 15-MAY-2000; 2000US-02 052 30P . 

PR 09-JUN-2000; 2000US-0211315P . 

PR 25-JUL-2000; 2 000US-022 0534 P . 
XX 

PA (MILL-) MILLENNIUM PREDICTIVE MEDICINE INC. 
XX 

PI Lillie J, Xu Y, Wang Y, Steinmann K; 
XX 

DR WPI; 2001-451856/48. 
XX 

PT New peptide useful as a marker for the diagnosis of breast cancer. 
XX 

PS Claim 1; Page 495; 3695pp; English. 
XX 

CC The invention relates to human breast cancer expressed polynucleotides 

CC (AAL07 54 4-AAL267 8 9) and methods of assessing whether a patient is 

CC afflicted with breast cancer by examining the correlation between the 

CC expression of certain markers and the cancerous state of breast cells. 

CC The polynucleotides and encoded polypeptides are potential markers for 

CC detecting, diagnosing, monitoring, characterising treating and 

CC potentially preventing breast cancer. The polynucleotides and encoded 

CC polypeptides are also useful for isolating compounds with cytostatic 

CC activity 

XX 

SQ Sequence 416 BP; 137 A; 92 C; 94 G; 93 T; 0 U; 0 Other; 

Query Match 2.2%; Score 35.2; DB 4; Length 416; 
Best Local Similarity 62.5%; Pred. No. 4; 

Matches 55; Conservative 0; Mismatches 33; Indels 0; Gaps 0; 

Qy 1451 C CT AGT AC C AAAGT GAAAT CT T GAG GAAAAT C C C T G GAAAGAGT G GAAAGT C C T GC C T AA 1510 

I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 221 CT T AAT AC CT C C AGCAAC CAGT T GT GAC AAT AC AT G CAAAGAGT G C AAAGT C T T GT C C AC 2 80 



Qy 1511 CACGTAAGTGCCTTCTTTGCTTGTTTGA 1538 

I I I I I I I I I I I I I I I I II I 
Db 281 GACGGATGTTCTTTTTTTTTTTTTTTGA 308 



RESULT 4 4 


AAL09772 


ID 


AAL09772 standard; cDNA; 459 BP. 


XX 




AC 


AAL097 1? • 


XX 






07— DFP— ? 0 0 1 (first entrv) 


yy 

AA 




JJHi 


Wnm^n hrpsQt" rsnrpr pynrp^^Pfi DolvnUclGOtld6 2229. 


YY 
AA 




J\vv 


"Human- Hr*^^c;t" c^nr'Rr* rpl 1 marker; cytostatic; ss. 


YY 
AA 




Uo 


noino sapiens. 


yy 

AA 




DM 
IT IN 




yy 




pn 

I: XJ 


1 9-JTJT.-2 0 01 


yy 




it r 


i n-.7AN-?om • 200iwo-usooo798 . 


XX 




DD 

irrv. 


i 4-.TAM-? n nn * ?ooou c i-oi76077P 


PR 


14-MAR-2000; 2000US-0189167P . 


PT3 
rK 


9 4-MAR-? 000 • 9 000TIS-01 92 099P 


PR 


29-MAR-2000; 2 000US-01934 8 OP . 




1 s-may-p nnn ■ ?ooous-o?05230P. 


DD 




DD 

c K 


9 tttt -? n nn • 9000us-o??0534P 


w 
AA 




D A 


/MTTT-\ MTT T.FMNTTTM PRFnTfTTVF, MEDICINE INC. 

1 1 V 1X i_i XJ — ) 11 1 1 i 1 I Hi IN IN _L U 1*1 ri\£jXyX^ 1 X V Li 1 IXiJj'X OXl^l J-i J.^1 \_- • 


yy 

AA 




IT J. 


T,i l l i p ,t Yn v Wana Y. Steinmann K; 


yy 




DR 


WPI; 2001-451856/48. 


XX 




prn 
IT 1 


New peptide useful as a marker for the diagnosis of breast cancer. 


XX 




PS 


Claim 1; Page 428-429; 3695pp; English. 


XX 




cc 


The invention relates to human breast cancer expressed polynucleotide 


cc 


(AAL07544-AAL26789) and methods of assessing whether a patient is 


cc 


afflicted with breast cancer by examining the correlation between the 


cc 


expression of certain markers and the cancerous state of breast cells 


cc 


The polynucleotides and encoded polypeptides are potential markers fo 


cc 


detecting, diagnosing, monitoring, characterising treating and 


cc 


potentially preventing breast cancer. The polynucleotides and encoded 


cc 


polypeptides are also useful for isolating compounds with cytostatic 


cc 


activity 


XX 




SQ 


Sequence 459 BP; 148 A; 101 C; 102 G; 106 T; 0 U; 2 Other; 



Query Match 2.2%; Score 35.2; DB 4; Length 459; 



Best Local Similarity 62.5%; Pred. No. 4.2; 

Matches 55; Conservative 0; Mismatches 33; Indels 0; Gaps 0; 



Qy 1451 C C T AGT AC CAAAGT GAAAT CT T GAG GAAAAT C C C T G GAAAGAGT G GAAAGT C C T GC CT AA 1510 

I I I I I I I II I II I I I I I I I I I M I I I I I I I I I I I I I 

Db 250 CT T AAT AC CT C C AGCAAC CAGT T GT GACAAT AC AT GCAAAGAGT GC AAAGT CT T GT C C AC 309 

Qy 1511 CACGTAAGTGCCTTCTTTGCTTGTTTGA 1538 

I I I I I I I II I II I I I II I I 
Db 310 GACGGATGTTCTTTTTTTTTTTTTTTGA 337 

RESULT 4 5 
AAZ16007/C 

ID AAZ16007 standard; cDNA; 760 BP. 
XX 

AC AAZ16007; 
XX 

DT 12-OCT-1999 (first entry) 
XX 

DE Human gene expression product cDNA sequence SEQ ID NO: 3476. 
XX 

KW Human; gene; gene expression product; diagnosis; therapy; probe; 

KW detection; mapping; tissue typing; profiling; forensic; cancer; 

KW genetic analysis; colorectal cancer; breast cancer; lung cancer; ss. 

XX 



OS 


Homo sapiens . 




XX 










PN 


W09938972-A2. 




XX 










PD 


05 


-AUG-1999 






XX 










PF 


28 


-JAN-1999; 99WO- 


US001619 


XX 










PR 


28 


-JAN-1998, 


? 98US- 


0072910P 


PR 


24 


-FEB-1998, 


? 98US- 


0075954P 


PR 


31 


-MAR-1998, 


; 98US- 


0080114P 


PR 


03 


-APR-1998, 


? 98US- 


0080515P 


PR 


03 


-APR-1998, 


; 98US- 


0080666P 


PR 


21 


-0CT-1998 


? 98US- 


0105234P 


PR 


28 


-OCT-1998 


; 98US- 


0105877P 



XX 

PA (CHIR ) CHIRON CORP. 

PA (HYSE-) HYSEQ INC. 
XX 

PI Williams LT, Escobedo J, Innis MA, Garcia PD, Sudduth-Klinger J; 

PI Reinhard C, Giese K, Randazzo F, Kennedy GC, Pot D, Kassam A; 

PI Lamson G, Drmanac R, Crkvenjakov R, Dickson M, Drmanac S, Labat I; 

PI Leshkowitz D, Kita D, Garcia V, Jones WL, Stache-Crain B; 

XX 

DR WPI; 1999-494092/41. 
XX 

PT Novel human genes and their expression products which are differentially 

PT expressed in different cell types. 

XX 

PS Claim 1; Page 1661-1662; 2479pp; English. 



CC The present invention describes a library of human polynucleotides 

CC comprising the sequences given in AAZ12532 to AAZ17779. Also described is 

CC a method of detecting differentially expressed genes correlated with the 

CC cancerous state of a mammalian cell, comprising detecting at least one 

CC differentially expressed gene product in a test sample from a cell 

CC suspected of being cancerous, where the gene product is encoded by one of 

CC the 5248 polynucleotide sequences given in AAZ12532 to AAZ17779. The 

CC polynucleotides can be used as a source of primers and probes, which can 

CC be used for a variety of purpose, e.g. detection of expression levels, 

CC mapping, tissue typing or profiling, forensics, genetic analysis and 

CC detection of polymorphisms. Polypeptides encoded by the polynucleotides 

CC can be used for raising antibodies for experimental, diagnostic and 

CC therapeutic purposes. The polynucleotides may also be used to construct 

CC arrays for diagnostics (which may be used to determine function of an 

CC encoded protein) ; and to detect differences in expression levels between 

CC two cells (e.g. to identify abnormal or diseased tissue in a human, to 

CC identify a genetic predisposition or susceptibility to a disease such as 

CC cancer) . The polynucleotides of the invention are especially used in the 

CC diagnosis, prognosis and management of colorectal cancer, breast cancer, 

CC and lung cancer. The polynucleotides can also be used to screen for 

CC peptide analogues and antagonists 

XX 

SQ Sequence 760 BP; 169 A; 210 C; 174 G; 181 T; 0 U; 26 Other; 

Query Match 2.2%; Score 35.2; DB 2; Length 760; 

Best Local Similarity 52.6%; Pred. No, 5.6; 

Matches 70; Conservative 0; Mismatches 63; Indels 0; Gaps 0; 

Qy 7 60 AAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCT 819 

I I 1 I I I I I 1 I I I I I I I I II III IN M I I 

Db 7 05 AAGCCAGGGGCTCCNTTTTAATTCATTCAGGGGGTGGGTTTTTTNAAACGCAGGGCAACT 64 6 

Qy 82 0 GT C C T GT GT AGAT GGAGAAG G CT C G GAG AGT GGGGGTGCTG GGGGC ACAAAAT GGAAT GA 879 

I I I I II II I I I I I I I I I I I I I I I II I I I I I I I II 

Db 64 5 TTTTATATAAANTCGAGGGTGCCAGGAAAGTGGGCCTGCNGGGTGCANAAAAGCGCAAGA 58 6 

Qy 88 0 ACACT GCT GAAGG 892 

I II II I I 
Db 585 AGCTTGT GGAAT G 573 



RESULT 4 6 
AAS68011/C 

ID AAS68011 standard; cDNA; 2412 BP. 
XX 

AC AAS68011; 
XX 

DT 13-FEB-2002 (first entry) 
XX 

DE DNA encoding novel human diagnostic protein #3815. 
XX 

KW Human; chromosome mapping; gene mapping; gene therapy; forensic; 

KW food supplement; medical imaging; diagnostic; genetic disorder; ss. 
XX 

OS Homo sapiens . 
XX 

PN WO200175067-A2 . 



XX 

PD ll-OCT-2001. 
XX 

PF 30-MAR-2001; 2001WO-US008631 . 
XX 

PR 31-MAR-2000; 2000US-0054 0217 . 

PR 23-AUG-2000; 2000US-0064 9167 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Drmanac RT, Liu C, Tang YT; 
XX 

DR WPI; 2001-639362/73. 

DR P-PSDB; ABG03824. 
XX 

PT New isolated polynucleotide and encoded polypeptides, useful in 

PT diagnostics, forensics, gene mapping, identification of mutations 

PT responsible for genetic disorders or other traits and to assess 

PT biodiversity. 
XX 

PS Claim 1; SEQ ID NO 3815; 103pp; English. 
XX 

CC The invention relates to isolated polynucleotide (I) and polypeptide (II) 

CC sequences. (I) is useful as hybridisation probes, polymerase chain 

CC reaction (PCR) primers, oligomers, and for chromosome and gene mapping, 

CC and in recombinant production of (II) . The polynucleotides are also used 

CC in diagnostics as expressed sequence tags for identifying expressed 

CC genes. (I) is useful in gene therapy techniques to restore normal 

CC activity of (II) or to treat disease states involving (II) . (II) is 

CC useful for generating antibodies against it, detecting or quantitating a 

CC polypeptide in tissue, as molecular weight markers and as a food 

CC supplement. (II) and its binding partners are useful in medical imaging 

CC of sites expressing (II) . (I) and (II) are useful for treating disorders 

CC involving aberrant protein expression or biological activity. The 

CC polypeptide and polynucleotide sequences have applications in 

CC diagnostics, forensics, gene mapping, identification of mutations 

CC responsible for genetic disorders or other traits to assess biodiversity 

CC and to produce other types of data and products dependent on DNA and 

CC amino acid sequences. AAS64 197-AAS 94564 represent novel human diagnostic 

CC coding sequences of the invention. Note: The sequence data for this 

CC patent did not appear in the printed specification, but was obtained in 

CC electronic format directly from WIPO at 

CC ftp . wipo . int/pub/published_pct_sequences 

XX 

SQ Sequence 2412 BP; 537 A; 743 C; 734 G; 398 T; 0 U; 0 Other; 

Query Match 2.2%; Score 35.2; DB 5; Length 2412; 
Best Local Similarity 55.8%; Pred. No. 11; 

Matches 67; Conservative 0; Mismatches 53; Indels 0; Gaps 0; 

Qy 57 CAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTC 116 

M I I I I I I II 111 I I I I Ml I I I I I I I I M Ml 

Db 1797 CAGCAGTGCCCGCTCCATTTCGGCCTCTCGGGCTGACTCCTGCAGCTGCTGCTCTAGCTC 1738 

Qy 117 TGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAG 17 6 

|| I I I I I I I I I I I I 1 II I I I I I I I I I I I I I I I M 

Db 1737 CTTCACACGGACCTTGAGCTGCTCCACGTGCCCCAGCACCTGAGCCCGCTCTTCTTCCAG 1678 



RESULT 47 
ADC32278/C 

ID ADC32278 standard; cDNA; 2412 BP. 
XX 

AC ADC32278; 
XX 

DT 18-DEC-2003 (first entry) 
XX 

DE Human novel cDNA contig sequence, SEQ ID NO: 2360. 
XX 

KW Human; diagnostic; drug screening; forensics; gene mapping; 

KW biodiversity assessment; Parkinson's disease; Alzheimer's disease; 

KW neurodegenerative diseases; anaemia; platelet disorder; wound; burns; 

KW ulcers; osteoporosis; autoimmune disease; cancer; 

KW molecular weight marker; food supplement; antiparkinsonian; nootropic; 

KW neuroprotective; antianaemic; anticoagulant; thrombolytic; vulnerary; 

KW antiulcer; osteopathic; immunosuppressive; antiinflammatory; cytostatic; 

KW gene therapy; chromosome llq23; ss. 
XX 

OS Homo sapiens . 
XX 

PN WO2003029271-A2 . 
XX 

PD 10-APR-2003. 
XX 

PF 24-SEP-2002; 2002WO-US030474 . 
XX 

PR 24-SEP-2001; 2001US-0324631P . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Tang TY, Zhang J, Ren F, Xue AJ, Zhao QA, Wang J, Wehrman T; 

PI Zhou P, Ghosh M, Wang D, Ma Y, Asundi V, Wang Z, Weng G; 

PI Haley-Vicente D, Drmanac RT; 
XX 

DR WPI; 2003-371981/35. 

DR P-PSDB; ADC33045. 
XX 

PT New polynucleotide and polypeptide useful for diagnosing, preventing or 

PT treating conditions such as neurodegenerative diseases, anemias, platelet 

PT disorders, wounds, burns, ulcers, osteoporosis, autoimmune diseases or 

PT cancer. 
XX 

PS Example 2; SEQ ID NO 2360; 1185pp; English. 
XX 

CC The invention relates to 971 novel human cDNA sequences (ADC29919- 

CC ADC30889) and the polypeptides they encode (ADC308 90-ADC318 60 ) . The 

CC invention also relates to nucleic acid sequences over 99% identical with 

CC the novel human cDNAs . The invention additionally encompasses expression 

CC vectors and host cells comprising a nucleic acid of the invention; the 

CC recombinant production of a polypeptide of the invention; an antibody 

CC against a polypeptide of the invention; a method of detecting 

CC polynucleotides or polypeptides of the invention; and methods of 

CC identifying a compound which binds to a polypeptide of the invention. The 

CC invention further discloses methods of peventing, treating or 



CC ameliorating a medical condition; kits comprising polynucleotide probes 

CC and/or monoclonal antibodies for carrying out the methods of the 

CC invention; methods for the identification of compounds that modulate the 

CC expression or activity of the polynucleotide and/or polypeptide; and 767 

CC contig sequences corresponding to the cDNA sequences of the invention 

CC (ADC31861-ADC32627) and the polypeptides encoded by the contigs (ADC32628 

CC -ADC33394) . The nucleic acids and polypeptides of the invention are 

CC useful in diagnostics, drug screening, forensics, gene mapping, in the 

CC identification of mutations responsible for genetic disorders or other 

CC traits, for assessing biodiversity, and in producing many other types of 

CC data and products dependent on DNA and amino acid sequences. They are 

CC also used for treating diseases such as Parkinson's disease, Alzheimer's 

CC disease and other neurodegenerative diseases, anaemia, platelet 

CC disorders, wounds, burns, ulcers, osteoporosis, autoimmune diseases or 

CC cancer. The nucleic acids may also be used as hybridisation probes or 

CC primers, and in the recombinant production of a protein. The polypeptides 

CC are also useful in generating antibodies, as molecular weight markers, 

CC and as food supplements. The present sequence represents a human contig 

CC sequence used in an example of the invention. Note: The sequence data for 

CC this patent did not form part of the printed specification, but was 

CC obtained in electronic format directly from WIPO at 

CC f tp. wipo . int/pub/published_pct_sequences . 

XX 

SQ Sequence 2412 BP; 537 A; 743 C; 734 G; 398 T; 0 U; 0 Other; 

Query Match 2.2%; Score 35.2; DB 9; Length 2412; 

Best Local Similarity 55.8%; Pred. No. 11; 

Matches 67; Conservative 0; Mismatches 53; Indels 0; Gaps 0; 



Qy 


57 


CAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTC 

1 1 1 1 1 1 1 1 11 III III 1 Ml 1 1 1 1 1 1 1 1 II Ml 
CAGCAGTGCCCGCTCCATTTCGGCCTCTCGGGCTGACTCCTGCAGCTGCTGCTCTAGCTC 


116 


Db 


1797 


1738 


Qy 


117 


TGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAG 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 
CTTCACACGGACCTTGAGCTGCTCCACGTGCCCCAGCACCTGAGCCCGCTCTTCTTCCAG 


176 


Db 


1737 


1678 



RESULT 4 8 


ABN59902/C 


ID 


ABN59902 standard; cDNA; 4866 BP. 


XX 




AC 


ABN59902; 


XX 




DT 


28-JUN-2002 (first entry) 


XX 




DE 


Novel human coding sequence SEQ ID NO: 313. 


XX 




KW 


Human; antianaemic; vulnerary; antiinflammatory; immunomodulator ; 


KW 


antiinf ertility; cerebroprotective; cytostatic; rheumatic; gene therapy; 


KW 


neuroprotective; antiparkinsonian; protein therapy; EST; 


KW 


expressed sequence tag; gene; ss. 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200222660-A2. 


XX 





PD 21-MAR-2002. 
XX 

PF 10-SEP-2001; 2001WO-US026015 . 
XX 

PR ll-SEP-2000; 2000US-00659671 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Tang YT, Liu C, Zhou P, Asundi V, Zhang J, Zhao QA, Ren F; 

PI Xue A J, Yang Y, Wehrman T, Drmanac RT; 

XX 

DR WPI; 2002-292408/33. 

DR P-PSDB; ABB97489. 
XX 

PT An isolated polynucleotide for treating diseases associated with its 

PT encoded polypeptide such as cancer and multiple sclerosis. 

XX 

PS Claim 1; SEQ ID NO 313; 509pp; English. 
XX 

CC The present invention provides the protein and coding sequences of 444 

CC novel human proteins. These were isolated from expressed sequences tags 

CC (ESTs) . They can be used to stimulate cell growth, to regulate 

CC haematopoiesis e.g. to treat aplastic anaemia, to help tissue regrowth 

CC e.g. in burn treatment, to regulate the immune system e.g. to treat 

CC multiple sclerosis, to regulate activin or inhibin e.g. to treat 

CC infertility, to regulate haemostasis or thrombolysis e.g. to treat stroke 

CC and cancer, to screen for drugs, to treat inflammatory conditions e.g. 

CC rheumatoid arthritis, and to treat nervous system disorders e.g. 

CC Parkinson's disease. The present sequence is a coding sequence of the 

CC invention 

XX 

SQ Sequence 4866 BP; 1038 A; 1481 C; 1463 G; 884 T; 0 U; 0 Other; 

Query Match 2.2%; Score 35.2; DB 6; Length 4866; 
Best Local Similarity 55.8%; Pred. No. 17; 

Matches 67; Conservative 0; Mismatches 53; Indels 0; Gaps 0; 

Qy 57 CAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTC 116 

I I I I 1 I II II III III I III I I I I I I I I II III 

Db 1838 CAGCAGTGCCCGCTCCATTTCGGCCTCTCGGGCTGACTCCTGCAGCTGCTGCTCTAGCTC 1779 

Qy 117 TGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAG 17 6 

II I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I 

Db 177 8 CTTCACACGGACCTTGAGCTGCTCCACGTGCCCCAGCACCTGAGCCCGCTCTTCTTCCAG 1719 



RESULT 4 9 
AAS72352/c 

ID AAS72352 standard; cDNA; 5011 BP. 
XX 

AC AAS72352; 
XX 

DT 13-FEB-2002 (first entry) 
XX 

DE DNA encoding novel human diagnostic protein #8156. 
XX 

KW Human; chromosome mapping; gene mapping; gene therapy; forensic; 



KW food supplement; medical imaging; diagnostic; genetic disorder; ss. 
XX 

OS Homo sapiens . 
XX 

PN WO200175067-A2 . 
XX 

PD ll-OCT-2001. 
XX 

PF 30-MAR-2001; 2001WO-US008 631 . 
XX 

PR 31-MAR-2000; 2 000US-00540217 . 

PR 23-AUG-2000; 2 OOOUS-0064 9167 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Drmanac RT, Liu C, Tang YT; 
XX 

DR WPI; 2001-639362/73. 

DR P-PSDB; ABG08165. 
XX 

PT New isolated polynucleotide and encoded polypeptides , useful in 

PT diagnostics, forensics, gene mapping, identification of mutations 

PT responsible for genetic disorders or other traits and to assess 

PT biodiversity. 
XX 

PS Claim 1; SEQ ID NO 8156; 103pp; English. 
XX 

CC The invention relates to isolated polynucleotide (I) and polypeptide (II) 

CC sequences. (I) is useful as hybridisation probes, polymerase chain 

CC reaction (PCR) primers, oligomers, and for chromosome and gene mapping, 

CC and in recombinant production of (II) . The polynucleotides are also used 

CC in diagnostics as expressed sequence tags for identifying expressed 

CC genes. (I) is useful in gene therapy techniques to restore normal 

CC activity of (II) or to treat disease states involving (II). (II) is 

CC useful for generating antibodies against it, detecting or quantitating a 

CC polypeptide in tissue, as molecular weight markers and as a food 

CC supplement. (II) and its binding partners are useful in medical imaging 

CC of sites expressing (II). (I) and (II) are useful for treating disorders 

CC involving aberrant protein expression or biological activity. The 

CC polypeptide and polynucleotide sequences have applications in 

CC diagnostics, forensics, gene mapping, identification of mutations 

CC responsible for genetic disorders or other traits to assess biodiversity 

CC and to produce other types of data and products dependent on DNA and 

CC amino acid sequences. AAS64197-AAS94564 represent novel human diagnostic 

CC coding sequences of the invention. Note: The sequence data for this 

CC patent did not appear in the printed specification, but was obtained in 

CC electronic format directly from WIPO at 

CC ftp . wipo . int/pub/published_pct_sequences 

XX 

SQ Sequence 5011 BP; 1096 A; 1500 C; 1500 G; 915 T; 0 U; 0 Other; 

Query Match 2.2%; Score 35.2; DB 5; Length 5011; 
Best Local Similarity 55.8%; Pred. No. 17; 

Matches 67; Conservative 0; Mismatches 53; Indels 0; Gaps 0 

Qy 57 CAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTC 116 
I I I I I I I I II III III I Ml I I I I I I M II Ml 



Db 1841 CAGCAGTGCCCGCTCCATTTCGGCCTCTCGGGCTGACTCCTGCAGCTGCTGCTCTAGCTC 1782 

Qy H7 TGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAG 17 6 

II II I I 1 I I I 1 I I I I I I I I I .HIM 

Db 17 81 CTTCACACGGACCTTGAGCTGCTCCACGTGCCCCAGCACCTGAGCCCGCTCTTCTTCCAG 1722 



RESULT 50 
ABK33144/C 

ID ABK33144 standard; DNA; 907 BP. 
XX 

AC ABK33144; 
XX 

DT 08-MAY-2002 (first entry) 
XX 

DE DNA encoding novel secreted protein Z931276G1P. 
XX 

KW Protein secretion; mammalian secreted polypeptide; MSP; gene; ss. 
XX 

OS Homo sapiens . 
XX 

PN WO200202621-A2 . 
XX 

PD 10-JAN-2002. 
XX 

PF 28-JUN-2001; 2001WO-US02 0638 . 
XX 

PR 30-JUN-2000; 2000US-0215446P . 
XX 

PA (ZYMO ) ZYMOGENETICS INC. 
XX 

PI Sheppard PO, Presnell SR; 
XX 

DR WPI; 2002-147999/19. 

DR P-PSDB; AAU83229. 
XX 

PT Novel isolated mammalian secreted polypeptide useful in therapeutic and 

PT diagnostic methods, to direct secretion of other proteins of interest 

PT from host cell, as educational tools, and as laboratory practicum kits. 
XX 

PS Claim 3; Page 374-376; 397pp; English. 
XX 

CC The invention describes an isolated mammalian secreted polypeptide (MSP) 

CC (I) . (I) is useful to direct the secretion of other proteins of interest 

CC from a host cell, to monitor secretion of proteins, to degenerate 

CC sequences comprising all nucleotide sequences encoding a particular 

CC polypeptide, to screen for cell metabolism effecting receptors, for 

CC identifying new target receptors and drug design, for identifying, for 

CC protein purification, for determining the weight of expressed MSP 

CC polypeptides as a ratio to total protein expressed, for identifying 

CC peptide cleavage sites, for coupling amino and carboxy terminal tags, for 

CC amino acid sequence analysis, for monitoring biological activities of the 

CC protein in vitro and in vivo, and to teach analytical skills and as 

CC reagents for the study of cells, receptors, and other binding molecules. 

CC The polynucleotide is useful for radiation hybrid mapping, and somatic 

CC cell genetic technique developed for constructing high-resolution, 

CC contiguous maps of mammalian chromosomes. Reagents disclosed in the 



CC invention may be used to detect metabolic abnormalities characterised by 

CC over or under production of the protein. This sequence encodes an 

CC mammalian secreted polypeptide, described in the method of the invention 

XX 

SQ Sequence 907 BP; 199 A; 299 C; 271 G; 138 T; 0 U; 0 Other; 

Query Match 2.2%; Score 35; DB 6; Length 907; 

Best Local Similarity 47.1%; Pred. No. 7.2; 

Matches 107; Conservative 0; Mismatches 120; Indels 0; Gaps 0; 
Qy 7 06 CTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCG 765 



Db 



4 68 CTCTGCCTTCTTCATCAGGGAGCTGGGACCTCGGACCAAGGCTCGGCCTTGGTGCAGCCT 



409 



Qy 



7 66 CTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTG 



825 



Db 



4 0 8 CAACCTGTCCCTGCGTCTCTTGTGCTCCCTGCTGCGTTTCCTGGTCCTCTCCCAACCTGG 



349 



82 6 T GTAGATGGAGAAGGCT CGGAGAGT GGGGGTGCTGGGGGCACAAAAT GGAAT GAACACT G 



885 



Db 



34 8 AGGTCGGTTCTCCTTCTCGGGGTAGGGCAATGCCAGTCCCAGGAGCAGGTCCTTGTCCTG 



289 



Qy 



886 CT GAAGGAAT GCAGGGT T CACTT CAAGAAGAAAGCAGT GT GCAGGT G 932 



Db 



288 CAGCAGGCCTGCAGGACTCTGCTCAGGCAGCAGCCCCTCAGCCTCTG 242 



Search completed: April 29, 2004, 15:06:56 
Job time : 761.83 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: April 29, 2004, 14:53:14 ; Search time 143.666 Seconds 

(without alignments ) 
6064.561 Million cell updates/sec 

Title: US-09-98 9-981A-9_COPY_3436_5005 

Perfect score: 1570 

Sequence: 1 cgaagcatcctgaagtacag ctagagagcaaacccagagc 1570 

Scoring table: I DENT I T Y_NUC 
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Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 
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Sequence 
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US- 09- 566-991 -lift 


Sequence 


118, App 
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4 


US -09-930-872-3 


Sequence 


3, Appli 


c 


41 


31 


6 


2 


. 0 


4042 


4 


US- 09-930-879- S 


Sequence 


5, Appli 
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31 


6 


2 


. 0 


1230025 


4 


UiJ U J i. J (J 1 J ^ ji. X 


Sequence 


s 1, Appli 




43 


31 


4 


2 


. 0 


1626 


3 


US- 08- 959-38 1 A- 4 


Sequence 


4, Appli 




44 


31 


2 


2 


. 0 


441 


3 


US-n9-060-7Sfi-^S9 


Sequence 


352, App 


c 


45 


31 


2 


2 


. 0 


441 


4 


US-09-670-314-352 


Sequence 


352, App 


c 


46 


31 


2 


2 


. 0 


459 


4 


US-09-621-976-1509 


Sequence 


1509, Ap 


c 


47 


30 


8 


2 


. 0 


505 


4 


US-09-621-976-15639 


Sequence 


15639, A 




48 
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2 


. 0 


1207 


4 


US-09-219-194-1 


Sequence 


1, Appli 


c 


49 
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.0 


2676 


4 


US-09-4 89-039A-4 738 


Sequence 


4738, Ap 
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.0 


2919 


4 


US-09-4 89-03 9A-4 696 


Sequence 


4696, Ap 



ALIGNMENTS 



RESULT 1 
US-09-172-108-8 

; Sequence 8, Application US/09172108 

; Patent No. 6160104 

; GENERAL INFORMATION: 

; APPLICANT: Cunnigham, Mary Jane 

; APPLICANT: Zweiger, Gary B. 

; APPLICANT: Panzer, Scott R. 

; APPLICANT: Seilhamer, Jeffrey J. 

TITLE OF INVENTION: MARKERS FOR PEROXISOMAL PROLIFERATORS 
; FILE REFERENCE: PA- 00 12 US 

; CURRENT APPLICATION NUMBER: US/09/172,108 
; CURRENT FILING DATE: 1998-10-13 



; NUMBER OF SEQ ID NOS : 56 
; SOFTWARE: PERL Program 
; SEQ ID NO 8 

LENGTH: 235 

TYPE: DNA 

ORGANISM: Homo sapiens 

FEATURE: - 
; OTHER INFORMATION: 700138117H1 
US-09-172-108-8 



Query Match 9.6%; Score 150.8; DB 3; Length 235; 

Best Local Similarity 90.4%; Pred. No. 2.6e-37; 

Matches 161; Conservative 0; Mismatches 17; Indels 0; Gaps 0 

391 GAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAG 450 

II I I I I I I I I I I I I I I I I I I III I I M I I II I II I I 1 I I I I I I I I I I I I I M I 
> 1 GAGGATTCACTCACATTTGCTTCCCGCTGGCCATGAGTGAGCTGCCCTTTCTGAGTCCAG 60 



451 AGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGG 510 

I M II I I M I I I II I I I i I I I I I I I I I I I I I I M I I I I I I I I I I I 1 I I MM II I 
61 AGGGAGCCAGAGGGCCTCACAACAACAGAGGGTCTCAGAGCTCCCTGGAGGAAGGCTCAG 12 0 

511 TCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

I M I M I I II M II I II I II II I II II II M M I I II II II I II I M I II II I 
121 TTACAGGCTCAGAGGCTCGGCACAGCTTAGGTGTCCTGAATGTGTCCTTCAGCGTCAG 17 8 



RESULT 2 

US-08-232-463-14 

; Sequence 14, Application US/08232463 
; Patent No. 5670367 

GENERAL INFORMATION: 

APPLICANT: DORNER, F. 
APPLICANT: SCHEI FLINGER, F. 
; APPLICANT: FALKNER, F. G. 

TITLE OF INVENTION: RECOMBINANT FOWLPOX VIRUS 
; NUMBER OF SEQUENCES: 52 

CORRESPONDENCE ADDRESS: 
; ADDRESSEE: Foley & Lardner 

; STREET: 1800 Diagonal Road, Suite 500 

; CITY: Alexandria 

; STATE : VA 

COUNTRY: USA 
ZIP: 22313-0299 
; COMPUTER READABLE FORM: 

; MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Patentln Release #1.0, Version #1.25 

CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/08/232,463 

FILING DATE: 

CLASSIFICATION: 435 
; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/07/935, 313 

FILING DATE: 
; APPLICATION NUMBER: EP 91 114 300.6 



FILING DATE: 2 6-AUG-1991 
ATTORNEY/ AGENT INFORMATION: 

NAME: BENT, Stephen A. 
; REGISTRATION NUMBER: 29,768 

REFERENCE/DOCKET NUMBER: 30472/114 IMMU 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703)836-9300 

TELEFAX: (703)683-4109 

TELEX: 899149 
; INFORMATION FOR SEQ ID NO: 14: 
; SEQUENCE CHARACTERISTICS: 

; LENGTH: 7218 base pairs 

; TYPE: nucleic acid 

; STRANDEDNESS : single 

; TOPOLOGY: linear 

; IMMEDIATE SOURCE: 

; CLONE: pTZgpt-Fls 

US-08-232-463-14 



Query Match 3.1%; Score 48.6; DB 1; Length 7218; 

Best Local Similarity 4.3%; Pred. No. 0.00019; 

Matches 10; Conservative 143; Mismatches 78; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

| | | | 1 | | | | : ::::::::: :: :::::::::: ::::::: : 
Db 1051 CGAGGGAGCTTGCGATYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1110 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 12 0 

Db 1111 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 117 0 

Qy 121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 18 0 

Db 1171 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1230 

Qy 181 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTC 231 

Db 1231 Y Y Y Y Y Y Y Y Y Y Y YY Y Y Y YY Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y YY Y Y Y Y Y Y Y Y Y Y Y Y Y 12 81 



RESULT 3 

US-08-232-463-14/c 

; Sequence 14, Application US/08232463 
; Patent No. 5670367 

GENERAL INFORMATION: 
; APPLICANT: DORNER, F. 

; APPLICANT: SCHEI FLINGER, F. 

APPLICANT: FALKNER, F. G. 

TITLE OF INVENTION: RECOMBINANT FOWLPOX VIRUS 
; NUMBER OF SEQUENCES: 52 
; CORRESPONDENCE ADDRESS: 

; ADDRESSEE: Foley & Lardner 

STREET: 1800 Diagonal Road, Suite 500 
; CITY: Alexandria 

; STATE: VA 

; COUNTRY: USA 

; ZIP: 22313-0299 



COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 
; SOFTWARE: Patentln Release #1.0, Version #1.25 

CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/08/232,4 63 

FILING DATE: 

CLASSIFICATION: 4 35 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/07/935,313 

FILING DATE: 

APPLICATION NUMBER: EP 91 114 300.6 

FILING DATE: 2 6-AUG-1991 
ATTORNEY/AGENT INFORMATION: 

NAME: BENT, Stephen A. 

REGISTRATION NUMBER: 29,768 
; REFERENCE/DOCKET NUMBER: 30472/114 IMMU 

TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703)836-9300 
; TELEFAX: (7 03)683-4109 

TELEX: 899149 
; INFORMATION FOR SEQ ID NO: 14: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 7218 base pairs 
; TYPE: nucleic acid 

; STRANDEDNESS: single 

; TOPOLOGY: linear 

; IMMEDIATE SOURCE: 

CLONE: pTZgpt-Fls 
US-08-232-463-14 

Query Match 3.0%; Score 47.2; DB 1; Length 7218; 

Best Local Similarity 3.4%; Pred. No. 0.00051; 

Matches 10; Conservative 175; Mismatches 113; Indels 0; Gaps 0; 
Qy 82 9 AGAT G GAGAAGG CT C G GAGAGT GGGGGTGCTG G GG GCAC AAAAT GGAAT GAACACTG CT G 888 



Db 1345 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 1286 

Qy 889 AAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTGCAGGTGTACCATCTCCCAGTCA 94 8 



Db 12 8 5 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 122 6 

Qy 949 G AGAC C C AGT AAT C AGAG CAG CTAAT G G G AGG C AT GCTCCTTGGGTG GT G G C C AACT T GT 1008 



Db 1225 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 1166 

Qy 1009 CAT TAT AC CT C CAAGGACAACAGAGT GGTACATAAGGCTAAAACAGAGTT GT CAACCT GT 1068 

Db 1165 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 1106 

Qy 1069 CCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACC 1126 



Db 1105 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRATCGCAAGCTCCCTCGACC 1048 



RESULT 4 

US-09-007-005-17/c 

; Sequence 17, Application US/09007005B 

; Patent No. 6258558 

; GENERAL INFORMATION: 

; APPLICANT: Szostak, Jack W. 

; APPLICANT: Roberts, Richard W. 

; APPLICANT: Liu, Rihe 

; TITLE OF INVENTION: SELECTION OF PROTEINS USING RNA- PROTEIN 
; TITLE OF INVENTION: FUSIONS 
; FILE REFERENCE: 00786/350003 

; CURRENT APPLICATION NUMBER: US/ 09/ 007 , 0 05B 

; CURRENT FILING DATE: 1998-01-14 

; EARLIER APPLICATION NUMBER: 60/035,963 

; EARLIER FILING DATE: 1997-01-27 

; EARLIER APPLICATION NUMBER: 60/064,491 

EARLIER FILING DATE: 1997-11-06 
; NUMBER OF SEQ ID NOS : 33 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 17 

LENGTH: 28 9 

TYPE: RNA 
; ORGANISM: Artificial Sequence 

FEATURE: 

; OTHER INFORMATION: Translation template 
; FEATURE: 

NAME/KEY: misc_f eature 
; LOCATION: (1) . . . (289) 
; OTHER INFORMATION: n = A,T,C or G 
US-09-007-005-17 

Query Match 2.5%; Score 39.2; DB 3; Length 289; 

Best Local Similarity 5.4%; Pred. No. 0.027; 

Matches 11; Conservative 92; Mismatches 99; Indels 0; Gaps 0 
Qy 31 CAGCTGGGTCTCTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGT 90 

Db 237 YAYCYGYCYAYGYCYTYGYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYN 178 

Qy 91 GGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCC 150 

Db 177 YSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYN 118 

Qy 151 CTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACA 210 



Db 117 YSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYN 58 

Qy 211 CCGTGTGTTCTGCCTATTGTCG 232 



Db 57 YCYAYTYTYGYTYAYAYTYTYG 36 



RESULT 5 

US-09-244-796-17/c 

; Sequence 17, Application US/09244796 
; Patent No. 6281344 
; GENERAL INFORMATION: 



APPLICANT: Szostak, Jack W. 
APPLICANT: Roberts, Richard W. 
APPLICANT: Liu, Rihe 

TITLE OF INVENTION: SELECTION OF PROTEINS USING RNA- PROTEIN 
TITLE OF INVENTION: FUSIONS 
FILE REFERENCE: 00786/350007 
CURRENT APPLICATION NUMBER: US/09/244 , 7 96 
CURRENT FILING DATE: 1999-02-05 
EARLIER APPLICATION NUMBER: 60/035,963 
EARLIER FILING DATE: 1997-01-27 
EARLIER APPLICATION NUMBER: 60/064,491 
EARLIER FILING DATE: 1997-11-06 
EARLIER APPLICATION NUMBER: 09/007,005 
EARLIER FILING DATE: 1998-01-14 
NUMBER OF SEQ ID NOS : 33 

SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 17 
LENGTH: 2 89 
TYPE: RNA 

ORGANISM: Artificial Sequence 
FEATURE : 

OTHER INFORMATION: Translation template 
FEATURE: 

NAME/KEY: misc_feature 
LOCATION: (1) . . . (289) 
OTHER INFORMATION: n = A,T,C or G 
US-09-244-796-17 

Query Match 2.5%; Score 39.2; DB 3; Length 289; 

Best Local Similarity 5.4%; Pred. No. 0.027; 

Matches 11; Conservative 92; Mismatches 99; Indels 0; Gaps 0 

Qy 31 CAGCTGGGTCTCTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGT 90 

Db 237 YAYCYGYCYAYGYCYTYGYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYN 178 

Qy 91 GGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCC 150 

:::::: : : : ::::::::::::: : : : : : : : : 

Db 177 YSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYN 118 

Qy 151 CTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAAACACA 210 

::::::: ::::::: ::::::::: : : : : : : : : 

Db 117 YSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYN 58 

Qy 211 CCGTGTGTTCTGCCTATTGTCG 2 32 

: I | | : : : : | : | | : I 
Db 57 YCYAYTYTYGYTYAYAYTYTYG 36 



RESULT 6 

US-09-621-976-2813 

; Sequence 2813, Application US/09621976 
; Patent No. 6639063 
; GENERAL INFORMATION: 

APPLICANT: Dumas Milne Edwards, J.B. 
; APPLICANT: Jobert, S. 
; APPLICANT: Giordano, J.Y. 



; TITLE OF INVENTION: ESTs and Encoded Human Proteins. 

FILE REFERENCE: GENSET . 054PR2 

; CURRENT APPLICATION NUMBER: US/ 09/62 1 , 97 6 

; CURRENT FILING DATE: 2000-07-21 

; NUMBER OF SEQ ID NOS : 19335 

; SOFTWARE : Patent . pm 

; SEQ ID NO 2813 
; LENGTH: 832 
; TYPE: DNA 
; ORGANISM: Homo sapiens 
; FEATURE : 
; NAME/KEY: CDS 
; LOCATION: 235.. 399 
US-09-621-976-2813 

Query Match 2.4%; Score 38.4; DB 4 ; Length 832; 

Best Local Similarity 9.7%; Pred. No. 0.087; 

Matches 30; Conservative 146; Mismatches 132; Indels 0; Gaps 0; 

Qy 1261 ATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTC 1320 

Db 2 RWYWWKYTTWYAKCWTKWKWSWSYWMYWKWYYMKTYWRWR 61 

Qy 1321 TAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTA 1380 

:|: | :: : : : :: :::| :::: : :::::: ::|:|:| : :: :: 

Db 62 YAMWGTYKKKAMCRTKTKKKKKKGYMWMWYWGWRRSYMAMWTRTWTGYAYYRSMMYW 121 

Qy 1381 AGT T GT AGAGAGG C AGC CAT GCAT T T GG CAT T T GAAT ACAAT CT GGT GACTT GT C T G GC T 14 40 



Db 122 RCWKKKAYYRKTTCYSSKGWTWWKRWKKAWTTWWWKKTYYWAATRYWWMMCWTKRWRASW 181 

Qy 1441 GC CAAT AGAACCTAGT ACCAAAGT GAAAT CTT GAGGAAAAT CCCT GGAAAGAGT GGAAAG 1500 

: | : : : : | : : I : : I I : : : I ::::::::: : I I : : 

Db 182 WYCWWWGKARKWSTWRKSRSYASARSAKRCCYSCSWGAMSWKYMWRMWRWRGWATGAGM 241 

Qy 1501 TCCTGCCTAACACGTAAGTGCCTTCTTTGCTTGTTTGATTGACTGTGATGCTAGAGAGCA 1560 

:|: :: |: : :: :|:::: :|:| : :: :: |:: ::|: :| 

Db 242 AWRAS CMMRRKYAGKS KT S YKSMWMCWT RS WKYC YTKARWTG YYC YRKGGMWGKRGRW YA 301 

Qy 1561 AACCCAGA 1568 



Db 302 SKKYMWKR 309 



RESULT 7 

US-09-389-956-ll/c 

; Sequence 11, Application US/09389956 

; Patent No. 6586579 

; GENERAL INFORMATION: 

; APPLICANT: Huang, Shi 

; TITLE OF INVENTION: PR-Domain Containing Nucleic Acids, Polypeptides, 

; TITLE OF INVENTION: Antibodies and Methods 

; FILE REFERENCE: P-LJ 3611 

; CURRENT APPLICATION NUMBER: US/ 09/38 9 , 956 

; CURRENT FILING DATE: 1999-09-03 

; NUMBER OF SEQ ID NOS: 93 

; SOFTWARE: Patentln Ver. 2.0 



; SEQ ID NO 11 
; LENGTH: 22 36 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE: 

NAME/ KEY: CDS 

LOCATION: (1) . . (1455) 
US-09-389-956-11 

Query Match 2.3%; Score 36.2; DB 4 ; Length 2236; 

Best Local Similarity 50.3%; Pred. No. 0.74; 

Matches 83; Conservative 2; Mismatches 80; Indels 0; Gaps 0; 

1174 GGGCCTTGGTG GAACAT CAAAT CAT G CC AG C AGAAGT GG GACAGGCAAAT C CT CAAAGAT 1233 

I I I I I I I I I I 1 I I I I III I I I I I I I I I I I I 
713 GTGCCTTGCTGGATCCTCTGCGCCGCGCAGATGCCGTAGGCCAGGCCGGGCACAGTACTG 654 

1234 GT CT C CT T GT ACAT CGAGAGT GG C CAGAT TAT GT G CAT C T TAG G CAG C T C AGGTAAGT GC 1293 

II I I II I I I I I I I II II I I I I I I I I I I III 
653 GTGCAGAGGCACACCTCGCGAGGCAGGTCCCGCAGCCACTCCGGCAGCTCCGGCGGGGGC 594 

12 94 CTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAG 1338 
I I I I :: I I I I I I I I I II I I I I I I I I 

593 GCGGCGGCCGCAGCGCTGCTGGTGCCCACAAGCCGGCGCAGCGAG 54 9 



Qy 

Db 

Qy 

Db 

Qy 

Db 



RESULT 8 

US-09-389-956-9/c 

; Sequence 9, Application US/09389956 
; Patent No. 6586579 
; GENERAL INFORMATION: 
; APPLICANT: Huang, Shi 

; TITLE OF INVENTION: PR-Domain Containing Nucleic Acids , Polypeptides, 
; TITLE OF INVENTION: Antibodies and Methods 
; FILE REFERENCE: P-LJ 3611 

; CURRENT APPLICATION NUMBER: US/ 09/38 9 , 956 
; CURRENT FILING DATE: 1999-09-03 
; NUMBER OF SEQ ID NOS : 93 

SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 9 

LENGTH: 24 88 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE: 

NAME/KEY: CDS 

LOCATION: (1) . . (1707) 
US-09-389-956-9 

Query Match 2.3%; Score 36.2; DB 4; Length 2488; 

Best Local Similarity 50.3%; Pred. No. 0.79; 

Matches 83; Conservative 2; Mismatches 80; Indels 0; Gaps 0; 

Qy 1174 GGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGAT 1233 

I I I I I I I I I I I I I I I III I I I I I I I II I I I 

Db 713 GTGCCTTGCTGGATCCTCTGCGCCGCGCAGATGCCGTAGGCCAGGCCGGGCACAGTACTG 654 



Qy 1234 GTCTCCTTGTACAT CGAGAGT GGCCAGATT AT GTGC AT CTTAGGCAGCTCAGGTAAGTGC 1293 



Db 653 GTGCAGAGGCACACCTCGCGAGGCAGGTCCCGCAGCCACTCCGGCAGCTCCGGCGGGGGC 594 

Qy 1294 CTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAG 1338 

II I I :: I I I 1 I I I I I II Ml I I III 

Db 593 GCGGCGGCCGCAGCGCTGCTGGTGCCCACAAGCCGGCGCAGCGAG 54 9 



RESULT 9 

US-09-62 1-976-8 976/c 

Sequence 8976, Application US/09621976 
Patent No. 6639063 
GENERAL INFORMATION: 
APPLICANT: Dumas Milne Edwards, J.B. 
APPLICANT : Jobert, S . 
APPLICANT: Giordano, J.Y. 

TITLE OF INVENTION: ESTs and Encoded Human Proteins. 
FILE REFERENCE: GENSET . 054PR2 
CURRENT APPLICATION NUMBER: US/ 09/ 62 1 , 97 6 
CURRENT FILING DATE: 2000-07-21 
NUMBER OF SEQ ID NOS : 19335 
SOFTWARE : Patent . pm 
SEQ ID NO 8976 
LENGTH: 399 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-09-621-976-8976 

Query Match 2.3%; Score 35.6; DB 4; Length 399; 

Best Local Similarity 11.2%; Pred. No. 0.43; 

Matches 29; Conservative 121; Mismatches 110; Indels 0; Gaps 0 

Qy 540 GGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGC 599 

Db 2 94 KGGSTYMAMRSRRGSTGRWSYRRAMWRGSKSWGGGSYYRMAGYRSSRWRSWYSAMWRKKK 235 

Qy 600 TCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAG 659 

Db 2 34 MTCWKGRSSWGSRSTGYYAWMYKKSWCTSRKWMYYKKRRKKWRRKCTSTKRTCYRGSTYK 17 5 

Qy 660 CAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCT 719 

| :: | :::::::: I : : : : : : : : : : : I I I I : I : I I 
Db 174 CWKAYYTKKRRKWTRWTYYYYKSYMSMKKTWRMKTAYYWTKRWKMTRTKWTWCTMCWKCT 115 

Qy 72 0 GCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCA 77 9 

: : | | : : : : : | : : : : I : : I : : : I : I :::::: :::::: 
Db 114 TYWMAGTMMYRYRRYWYYAKRAKWSKRCTWSTTCYCMKYMAKKCWSYWWSMSMMKW 55 

Qy 780 CTGATTTCTGCTCTCCCCTT 7 99 

: : : | : : : : : : : : : | : 
Db 54 WWKWT YYYYY YMMKWS KMT Y 35 



RESULT 10 
US-09-540-224-3 

; Sequence 3, Application US/09540224 
; Patent No. 6468543 



; GENERAL INFORMATION: 

; APPLICANT: Gilbertson, Debra G. 

; APPLICANT: Hart, Charles E. 

TITLE OF INVENTION: METHODS FOR PROMOTING GROWTH OF BONE, 
; TITLE OF INVENTION: LIGAMENT AND CARTILAGE USING ZVEGF4 
; FILE REFERENCE: 00-28 

; CURRENT APPLICATION NUMBER: US/09/540 , 224 

; CURRENT FILING DATE: 2000-03-31 

; EARLIER APPLICATION NUMBER: US 60/180,169 

; EARLIER FILING DATE: 2000-02-04 

; NUMBER OF SEQ ID NOS: 9 

; SOFTWARE: FastSEQ for Windows Version 3.0 

; SEQ ID NO 3 

; LENGTH: 1472 

; TYPE : DNA 

; ORGANISM: Mus mus cuius 

FEATURE: 

NAME/ KEY: CDS 

LOCATION: ( 93 ) . . . ( 12 05 ) 
US-09-540-224-3 

Query Match 2.1%; Score 33.6; DB 4; Length 1472; 

Best Local Similarity 59.4%; Pred. No. 3.8; 

Matches 57; Conservative 0; Mismatches 39; Indels 0; Gaps 0; 

Qy 827 GT AGAT GGAGAAGGCT CGGAGAGT GGGGGT GCT GGGGGCAC AAAAT GGAAT GAAC ACT GC 886 

I I I I I I I I I i I I I II I I I I I I I I I I I I 11 I II III 
D b 435 GAAGT T GAAGAAGT CT C AGAGAGC AG C ACT GT T GT C AGAG GAAGAT GGT GT GG C C ACAAG 4 94 

Qy 8 87 T GAAG GAAT G C AG G GT T C ACT T C AAGAAGAAAGC AG 922 

I I I I I I I I I I I I I I I I I I I I I I 

Db 4 95 GAGAT C C CT C CAAGGAT AAC GT CAAGAACAAAC C AG 53 0 



RESULT 11 
US-09-564-595D-52 

; Sequence 52, Application US/09564595D 

; Patent No. 6495668 

; GENERAL INFORMATION: 

; APPLICANT: Gilbert, Teresa 

; APPLICANT: Hart, Charles E. 

; APPLICANT: Sheppard, Paul 0. 

; TITLE OF INVENTION: GROWTH FACTOR HOMOLOG ZVEGF4 
; FILE REFERENCE: 99-19 

CURRENT APPLICATION NUMBER: US/ 09/564 , 595D 

CURRENT FILING DATE: 2000-05-03 
; PRIOR APPLICATION NUMBER: US 09/304,216 
; PRIOR FILING DATE: 1999-05-03 
; PRIOR APPLICATION NUMBER: US 60/164,4 63 
; PRIOR FILING DATE: 1999-11-10 
; PRIOR APPLICATION NUMBER: US 60/180,169 
; PRIOR FILING DATE: 2000-02-04 
; NUMBER OF SEQ ID NOS: 57 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 52 

LENGTH: 1472 

TYPE: DNA 



ORGANISM: Mus mus cuius 
FEATURE : 
NAME/ KEY : CDS 
LOCATION: (93) . . . (1205) 
US-09-564-595D-52 

Query Match 2.1%; Score 33.6; DB 4; Length 1472; 

Best Local Similarity 59.4%; Pred. No. 3.8; 

Matches 57; Conservative 0; Mismatches 39; Indels 0; Gaps 0 

Qy 827 GTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGC 88 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II Ml 
Db 435 GAAGT T GAAGAAGT C T CAGAGAG C AG C ACT GT T GT CAGAG GAAGAT GGT GT G GC C ACAAG 494 

Qy 887 T GAAG GAAT G CAGGGT T CACT T C AAGAAGAAAGCAG 922 

I I I I I I I II I M I I II I II I I I 

Db 4 95 GAGAT C C C T C CAAGGAT AAC GT C AAGAACAAAC CAG 530 



RESULT 12 
US-09-808-972-3 

; Sequence 3, Application US/09808972 

; Patent No. 6630142 

; GENERAL INFORMATION: 

; APPLICANT: Hart, Charles E. 

; APPLICANT: Topouzis, Stavros 

; APPLICANT: Gilbertson, Debra G. 

TITLE OF INVENTION: METHOD OF TREATING FIBRO PROLIFERATIVE 
; TITLE OF INVENTION: DISORDERS 

FILE REFERENCE: 00-79 
; CURRENT APPLICATION NUMBER: US/ 09/ 8 08 , 972 
; CURRENT FILING DATE: 2001-03-14 
; PRIOR APPLICATION NUMBER: US 60/235,295 
; PRIOR FILING DATE: 2000-09-26 
; PRIOR APPLICATION NUMBER: US 09/564,595 
; PRIOR FILING DATE: 2000-05-03 
; PRIOR APPLICATION NUMBER: US 60/180,169 
; PRIOR FILING DATE: 2000-02-04 
; PRIOR APPLICATION NUMBER: US 60/164,463 
; PRIOR FILING DATE: 1999-11-10 
; PRIOR APPLICATION NUMBER: US 60/132,250 
; PRIOR FILING DATE: 1999-05-03 
; NUMBER OF SEQ ID NOS : 13 

; SOFTWARE: FastSEQ for Windows Version 3.0 
; SEQ ID NO 3 

LENGTH: 1472 
; TYPE: DNA 
; ORGANISM: Mus mus cuius 

FEATURE: 

NAME/ KEY: CDS 

LOCATION: ( 93 ) . . . ( 12 05 ) 
US-09-808-972-3 



Query Match 2.1%; Score 33.6; DB 4; Length 1472; 

Best Local Similarity 59.4%; Pred. No. 3.8; 

Matches 57; Conservative 0; Mismatches 39; Indels 0; Gaps 0 



Qy 827 GTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGC 886 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II III 
Db 4 35 GAAGT T GAAGAAGT C T CAGAGAGC AG CACT GT T GT C AGAGGAAGAT G GT GT G G C CAC AAG 494 

Qy 8 87 T GAAGGAAT GCAGGGTT CACTT CAAGAAGAAAGCAG 922 

I I I I II I I I I I I I I II I I I I I I 

Db 4 95 GAGAT C CCT C CAAGGATAACGT CAAGAACAAAC CAG 530 



RESULT 13 

US-09-4 8 4-97 0B-17/C 

Sequence 17, Application US/09484970B 
Patent No. 6426186 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 



Jones, Karen A. 
Volkmuth, Wayne 
Walker, Michael G. 
TITLE OF INVENTION: BONE REMODELING GENES 
FILE REFERENCE: PB-0014 US 

CURRENT APPLICATION NUMBER: US/ 0 9/4 84 , 97 OB 
CURRENT FILING DATE: 2000-01-18 
NUMBER OF SEQ ID NOS : 172 
SOFTWARE: PERL Program 
SEQ ID NO 17 
LENGTH: 1651 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

NAME/ KEY: misc_feature 

OTHER INFORMATION: Incyte ID No. 6426186 126510. 2CB1 
NAME/ KEY: unsure 
LOCATION: 767-84 6 

OTHER INFORMATION: a, t, c, g, or other 
US-09-484-970B-17 

Query Match 2.1%; Score 33.6; DB 4; Length 1651; 

Best Local Similarity 52.9%; Pred. No. 4.1; 

Matches 72; Conservative 0; Mismatches 64; Indels 0; Gaps 0; 

Qy 415 T GCT AG C CAT G G GT GAG CTGCCCTTTCT GAGT C CAGAGGGAG C C AGAG GGC C T CACAT C A 474 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 1075 T GC CAG C AAC AGACT CTCCTCCCTTGCT GAGAC C AGAAGGT GAGT GAG GGCT T T GCAAT G 1016 

Qy 4 75 ACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACA 534 

I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1015 AGGGCAGGGCATGGAGGTGACTGTCACTCTTTGCTGGCAGGGGGTCTCAGGACTATAGGA 956 

Qy 535 GCTTAGGTGTCCTGCA 550 

III I I I I I I I 
Db 955 ACTTTAGAGCCTTGCA 94 0 



RESULT 14 

US-09-621-976-17202 

; Sequence 17202, Application US/09621976 
; Patent No. 6639063 
; GENERAL INFORMATION: 



APPLICANT : Dumas Milne Edwards, J.B. 
APPLICANT: Jobert, S. 
APPLICANT: Giordano, J.Y. 

TITLE OF INVENTION: ESTs and Encoded Human Proteins. 
FILE REFERENCE: GENSET . 054PR2 
CURRENT APPLICATION NUMBER: US/ 0 9/ 62 1 , 97 6 
CURRENT FILING DATE: 2000-07-21 
NUMBER OF SEQ ID NOS : 19335 
SOFTWARE: Patent. pm 
SEQ ID NO 17202 
LENGTH: 3 64 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-0 9-62 1-97 6-172 02 

Query Match 2.1%; Score 33.2; DB 4; Length 364; 

Best Local Similarity 13.5%; Pred. No. 2.3; 

Matches 38; Conservative 118; Mismatches 126; Indels 0; Gaps 0; 

Qy 14 6 GAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCTYTCTGGCAAACACTTCCTATAA 205 

|||:: | : : : | : I I : : I : :: | ::::::: : : : : : : 

D b 21 GAGYSGMCKSSRSYGRRSSCCGSMGWSGCSCSKRSWSRCRCMKSMWSWMMYMRSMKYKRS 80 

Qy 2 06 AC AC AC C GT GT GT T CT GC CT AT T GT C GAGAT AAGG AC ACT CT G GC T AAAG GT AC AT C AGA 265 

Db 81 TCASCKYKGGKMACMTCWSTGAMYRYMASYGWCYSYM 140 

Qy 2 66 TAATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTA 325 

: | I : | : : | : : : : | : : : : : : : : : : : : I : : : I 

Db 141 GMCCMWCAGSGMCYSRSAGSRYSKKGSRGRWYWKKGCSRATSKKGRMMWMKKGSRRRATS 200 

Qy 326 GGATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGC 385 

: : : : : : : : : : : I : I I : : : : : : | | : : : I : : : : : I : : 
Db 201 RYGMMSSMYGASKRMSSMCSASTRMSSASCMMYMMMSAGSYAS CAWKMSKYRRCAKWSCT 260 

Qy 386 CACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGG 427 

: :::| : : | :: : I : : : I : : : I : M : 

Db 2 61 YSWYMRASMKKSKYCAWSRKGSKCCMYSRKGSKSCYCCWGGS 302 



RESULT 15 

US-0 9-62 1-976-2 8 13/c 

; Sequence 2813, Application US/09621976 
; Patent No. 6639063 
; GENERAL INFORMATION: 

; APPLICANT: Dumas Milne Edwards, J.B. 

; APPLICANT: Jobert, S. 

; APPLICANT: Giordano, J.Y. 

TITLE OF INVENTION: ESTs and Encoded Human Proteins. 

FILE REFERENCE: GENSET . 054 PR2 
; CURRENT APPLICATION NUMBER: US/09/621 , 976 
; CURRENT FILING DATE: 2000-07-21 
; NUMBER OF SEQ ID NOS: 19335 
; SOFTWARE: Patent. pm 
; SEQ ID NO 2813 

LENGTH: 832 
; TYPE: DNA 



ORGANISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: 235. .399 
US-09-621-976-2813 

Query Match 2.1%; Score 33.2; DB 4; Length 832; 

Best Local Similarity 12.3%; Pred. No. 3.7; 

Matches 20; Conservative 82 ; Mismatches 60; Indels 0; Gaps 0 ; 

Qy 134 0 CTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTAAGTTGTAGAGAGGCAGCCA 1399 

| : : : | : : : : : : I : | ::: | | ::::::: | : | : : : : : : : | | : : 
Db 188 CWWWGRWWS T YWYMAWGKKWWRYATTWRRAMMWWWAAWTMMW YMWWA S RGAAMYRR 129 

Qy 1400 T GC AT T T GG C AT T T GAAT ACAAT C T GGT GAC TTGTCTGGCTGC CAAT AGAAC CT AGT AC C 1459 

| : : : : : : : : : : | : : : : : | : : : : | : : : : | : : : | : : : 

Db 128 TMMMWGYRYWWRKKSYRRTRCAWAYAWKTKRSYYWCWRWKW 69 

Qy 1460 AAAGT GAAAT CT T GAG GAAAAT C C CT GGAAAGAGT GGAAAGT 1501 

Db 68 RACWKTRYWRWWAWAMWRMWWTMMMMYYWYWRAMKRRWMWR 27 



RESULT 16 

US-09-103-840A-2/C 

; Sequence 2, Application US/09103840A 

; Patent No. 6294328 

; GENERAL INFORMATION: 

; APPLICANT: FLEISCHMAN, Robert D. 

; APPLICANT: WHITE, Owen R. 

; APPLICANT: FRASER, Claire M. 

; APPLICANT: VENTER, John C. 

; TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 

; TITLE OF INVENTION: TUBERCULOSIS 

; FILE REFERENCE: 2 4366-2 0007.0 0 

; CURRENT APPLICATION NUMBER: US/ 09/ 103 , 8 4 OA 

; CURRENT FILING DATE: 1998-06-24 

; NUMBER OF SEQ ID NOS : 2 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 2 

LENGTH: 4403765 

TYPE : DNA 

; ORGANISM: Mycobacterium tuberculosis 
FEATURE: 

OTHER INFORMATION: CDC 1551 

OTHER INFORMATION: "n" bases at various positions throughout the sequence 
; OTHER INFORMATION: represent a, t, c or g 
US-09-103-840A-2 

Query Match 2.1%; Score 33.2; DB 3; Length 4403765; 

Best Local Similarity 52.1%; Pred. No. 1.2e+02; 

Matches 74; Conservative 0; Mismatches 68; Indels 0; Gaps 0; 

Qy 599 CTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCA 658 

I I I I I I I I I I I I I I I I I II I I I I II I III III 
Db 8 6652 9 CACGCAGCTCGTCTATGGTGAGTTCGTCGTTATGCAGGACGGCGTTGACTTGGACCTCGA 

866470 



Qy 659 GC AGAT CAG GGT GAAAGT GGACAGT CT GTAACAAC AGT GAGT CGTTCCTCCTCCTCCTCC 718 

I I I I I I I I II II I I II I I I I II I I I I I III 
Db 8 664 69 GCAGATCCGACTGGCCTTGAGCGGTCAGCACCGCGATCACGGCCATCCGCCGGTCCCGCA 
866410 

Qy 719 T GCGCAGGGCAGAGCCT GGACA 740 

II I I I I I I I I II 

Db 866409 TGGACAGGCCGGGACGGGTCCA 866388 



RESULT 17 

US-09-103-840A-l/c 

; Sequence 1, Application US/09103840A 

; Patent No. 6294328 

; GENERAL INFORMATION: 

; APPLICANT: FLEISCHMAN, Robert D. 

; APPLICANT: WHITE, Owen R. 

; APPLICANT: FRASER, Claire M. 

; APPLICANT: VENTER, John C. 

; TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 

; TITLE OF INVENTION: TUBERCULOSIS 

; FILE REFERENCE: 2 4366-20007.00 

; CURRENT APPLICATION NUMBER: US/ 09/ 103 , 84 OA 

; CURRENT FILING DATE: 1998-06-24 

; NUMBER OF SEQ ID NOS : 2 

; SOFTWARE: Patent In Ver. 2.1 

; SEQ ID NO 1 

LENGTH: 4411529 

TYPE: DNA 

; ORGANISM: Mycobacterium tuberculosis 

OTHER INFORMATION: H37Rv 
US-09-103-840A-1 

Query Match 2.1%; Score 33.2; DB 3; Length 4411529; 

Best Local Similarity 52.1%; Pred. No. 1.2e+02; 

Matches 74 ; Conservative 0 ; Mismatches 68 ; Indels 0 ; Gaps 0 

Qy 599 CTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCA 658 

I I I I I I I I I I I I I I I I I II I I I I I I I III III 
Db 8 6439 9 CACGCAGCTCGTCTATGGTGAGTTCGTCGTTATGCAGGACGGCGTTGACTTGGACCTCGA 

864340 

Qy 659 GCAGAT CAGGGT GAAAGT GGACAGT CT GTAACAAC AGT GAGT C GTT CCT C CT CCT C CT C C 718 

I I I I I I I I II II I III I I I I II I I I II III 
Db 8 64339 GCAGATCCGACTGGCCTTGAGCGGTCAGCACCGCGATCACGGCCATCCGCCGGTCCCGCA 
864280 

Qy 719 T GCGCAGGGCAGAGCCT GGACA 740 

II I I I I I I I I II 

Db 864279 TGGACAGGCCGGGACGGGTCCA 864258 



RESULT 18 
US-09-078-294-4 

; Sequence 4, Application US/09078294 
; Patent No. 6265211 



; GENERAL INFORMATION: 

; APPLICANT: Choo, Kong-Hong Andy 

; APPLICANT: Du Sart, Desiree 

; APPLICANT: Cancilla, Michael R. 

; TITLE OF INVENTION: A NOVEL NUCLEIC ACID MOLECULE 
; FILE REFERENCE: Davies Col 

; CURRENT APPLICATION NUMBER: US/09/078, 294 

; CURRENT FILING DATE: 1998-05-13 

; NUMBER OF SEQ ID NOS : 2 9 

; SOFTWARE: Patentln Ver. 2.0 

; SEQ ID NO 4 

LENGTH: 80246 
; TYPE: DNA 

; ORGANISM: Nucleotide sequence of NC-contig 
US-09-078-294-4 

Query Match 2.1%; Score 33; DB 3; Length 80246; 

Best Local Similarity 48.6%; Pred. No. 56; 

Matches 87; Conservative 1; Mismatches 91; Indels 0; Gaps 0; 

Qy 9 CCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCCATGACCA 68 

III I I I I I I I I I I I I I I I I I I I I I II 

Db 21544 CCTCCCTTCCCTTCCCCTCCCCTCCCCTTCCCTTCTCCCTCTCCTTCCCTTCCTCTTCCC 21603 

Qy 69 GTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGCTCC 12 8 

III I I I I I I I II i I I II I I I I II II I I I I I I I I 

Db 21604 TTCCTTCCTCTTCCCTTCCTTTCCCCTCCCCTTCCTTTCCCTTCCTCCCTCCCTTCCTCC 21663 

Qy 12 9 TTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCTYTCT 187 

I I II I I I I I 1 I I I I I I I I I I I I I I I I : I I 

Db 21664 CTTCTTTCCTTCCCTTCTTTCCTTCCTCATTTCCTCCCTTCCTTCCTTCCTTCCTTCCT 21722 



RESULT 19 
US-09-078-294-3 

; Sequence 3, Application US/09078294 

; Patent No. 6265211 

; GENERAL INFORMATION: 

; APPLICANT: Choo, Kong-Hong Andy 

; APPLICANT: Du Sart, Desiree 

; APPLICANT: Cancilla, Michael R. 

; TITLE OF INVENTION: A NOVEL NUCLEIC ACID MOLECULE 

FILE REFERENCE: Davies Col 
; CURRENT APPLICATION NUMBER: US/ 0 9/ 07 8 , 2 94 
; CURRENT FILING DATE: 1998-05-13 
; NUMBER OF SEQ ID NOS: 29 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 3 
; LENGTH: 8 0595 
; TYPE: DNA 

ORGANISM: Nucleotide sequence of HC-contig 
US-09-078-294-3 

Query Match 2.1%; Score 33; DB 3; Length 80595; 

Best Local Similarity 48.6%; Pred. No. 56; 

Matches 87; Conservative 1; Mismatches 91; Indels 0; Gaps 0; 



Qy 9 CCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGCCATGACCA 68 

III I I I I II I I I I II II I I I I I I I II 

Db 21806 CCTCCCTTCCCTTCCCCTCCCCTCCCCTTCCCTTCTCCCTCTCCTTCCCTTCCTCTTCCC 21865 

Qy 69 GTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGCTCC 128 

III I I I II I I II I I I I II I I I II II I I I I I II I 

Db 21866 TTCCTTCCTCTTCCCTTCCTTTCCCCTCCCCTTCCTTTCCCTTCCTCCCTCCCTTCCTCC 21925 

Qy 12 9 TTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCTYTCT 187 

I I El I I I I I III I I I I I I I I 111111:11 

Db 21926 CTTCTTTCCTTCCCTTCTTTCCTTCCTCATTTCCTCCCTTCCTTCCTTCCTTCCTTCCT 21984 



RESULT 2 0 

US-10-162-012-41/C 

; Sequence 41, Application US/10162012 

; Patent No. 6682597 

; GENERAL INFORMATION: 

; APPLICANT: Curtis, Rory A.J. 

; APPLICANT: Silos-Santiago, Inmaculada 

; APPLICANT: Gu, Wei 

; TITLE OF INVENTION: NOVEL HUMAN ION CHANNEL AND TRANSPORTER FAMILY MEMBERS 

; FILE REFERENCE: 10448-190001 

; CURRENT APPLICATION NUMBER: US/10/162,012 

; CURRENT FILING DATE: 2002-06-04 

; PRIOR APPLICATION NUMBER: US 60/209,845 

PRIOR FILING DATE: 2000-06-06 
; PRIOR APPLICATION NUMBER: US 09/875,321 
; PRIOR FILING DATE: 2001-06-06 

PRIOR APPLICATION NUMBER: PCT/US01/ 18 34 0 

PRIOR FILING DATE: 2001-06-06 
; PRIOR APPLICATION NUMBER: US 60/209,257 
; PRIOR FILING DATE: 2000-06-05 
; PRIOR APPLICATION NUMBER: US 09/875,423 
; PRIOR FILING DATE: 2001-06-05 
; PRIOR APPLICATION NUMBER: PCT/US01/ 18398 
; PRIOR FILING DATE: 2001-06-05 
; PRIOR APPLICATION NUMBER: US 60/209,238 
; PRIOR FILING DATE: 2000-06-05 
; PRIOR APPLICATION NUMBER: US 09/875,363 
; PRIOR FILING DATE: 2001-06-05 
; PRIOR APPLICATION NUMBER: PCT/US01/18247 
; PRIOR FILING DATE: 2001-06-05 

PRIOR APPLICATION NUMBER: US 60/227,068 
; PRIOR FILING DATE: 2000-08-22 
; PRIOR APPLICATION NUMBER: US 09/928,530 

PRIOR FILING DATE: 2001-08-13 
; PRIOR APPLICATION NUMBER: PCT/US01/25475 
; PRIOR FILING DATE: 2001-08-15 
; PRIOR APPLICATION NUMBER: US 60/226,770 
; PRIOR FILING DATE: 2000-08-21 
; PRIOR APPLICATION NUMBER: US 09/934,421 
; PRIOR FILING DATE: 2001-08-21 
; PRIOR APPLICATION NUMBER: PCT/US01/26096 
; PRIOR FILING DATE: 2001-08-21 
; PRIOR APPLICATION NUMBER: US 60/279,281 
; PRIOR FILING DATE: 2001-03-28 



PRIOR APPLICATION NUMBER: US 10/109,029 
PRIOR FILING DATE: 2002-03-28 
PRIOR APPLICATION NUMBER: PCT/US02/0972 8 
PRIOR FILING DATE: 2002-03-28 
PRIOR APPLICATION NUMBER: US 60/290,288 
PRIOR FILING DATE: 2001-05-11 
PRIOR APPLICATION NUMBER: US (not assigned) 
PRIOR FILING DATE: 2002-05-13 
NUMBER OF SEQ ID NOS : 48 

SOFTWARE: Fast SEQ for Windows Version 4.0 
SEQ ID NO 41 
LENGTH: 1119 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-10-162-012-41 

Query Match 2.1%; Score 32.8; DB 4; Length 1119; 

Best Local Similarity 50.0%; Pred. No. 5.8; 

Matches 82; Conservative 0; Mismatches 82; Indels 0; Gaps 0; 

Qy 8 87 TGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTGCAGGTGTACCATCTCCCAGT 946 

|| II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 388 TGGTGGCTGGCCGGGAGGACATCCAGAGGGAGAAGAGGCTGATGAGCATGCTGGCAAAGT 329 

Qy 94 7 CAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTT 10 0 6 

Ml II I II III I I I I i I I I I I I II IN III I 

Db 328 CAGTGAGCAGGTGTGCTGCGTCAGTCATGACAGCCAAGCTGTGTGCCAGGTACCCACCAA 269 

Qy 1007 GT CAT T ATACCT CCAAGGACAACAGAGT GGT ACATAAGGCTAAA 1050 

I I 1 I I I I I I I I I II II I I I I I I I 

Db 268 C GACTT CT C C GAT CAT GAACAACAGGCAGAT GGCAGAGGCT ACA 225 



RESULT 21 

US-10-162-012-39/C 

; Sequence 39, Application US/10162012 

; Patent No. 6682597 

; GENERAL INFORMATION: 

; APPLICANT: Curtis, Rory A.J. 

; APPLICANT: Silos-Santiago, Inmaculada 

; APPLICANT: Gu, Wei 

; TITLE OF INVENTION: NOVEL HUMAN ION CHANNEL AND TRANSPORTER FAMILY MEMBERS 

; FILE REFERENCE: 10448-190001 

; CURRENT APPLICATION NUMBER: US/10/162,012 

; CURRENT FILING DATE: 2002-06-04 

; PRIOR APPLICATION NUMBER: US 60/209,845 

; PRIOR FILING DATE: 2000-06-06 

PRIOR APPLICATION NUMBER: US 09/875,321 

PRIOR FILING DATE: 2001-06-06 
; PRIOR APPLICATION NUMBER: PCT/US01/ 18 34 0 
; PRIOR FILING DATE: 2001-06-06 
; PRIOR APPLICATION NUMBER: US 60/209,257 
; PRIOR FILING DATE: 2000-06-05 
; PRIOR APPLICATION NUMBER: US 09/875,423 
; PRIOR FILING DATE: 2001-06-05 
; PRIOR APPLICATION NUMBER: PCT/US01/ 18 398 
; PRIOR FILING DATE: 2001-06-05 



PRIOR APPLICATION NUMBER: US 60/209,238 
PRIOR FILING DATE: 2000-06-05 
PRIOR APPLICATION NUMBER: US 09/875,363 
PRIOR FILING DATE: 2001-06-05 
PRIOR APPLICATION NUMBER: PCT/US01/ 18247 
PRIOR FILING DATE: 2001-06-05 
PRIOR APPLICATION NUMBER: US 60/227,068 
PRIOR FILING DATE: 2000-08-22 
PRIOR APPLICATION NUMBER: US 09/928,530 
PRIOR FILING DATE: 2001-08-13 
PRIOR APPLICATION NUMBER: PCT/US01/25475 
PRIOR FILING DATE: 2001-08-15 
PRIOR APPLICATION NUMBER: US 60/226,770 
PRIOR FILING DATE: 2000-08-21 
PRIOR APPLICATION NUMBER: US 09/934,421 
PRIOR FILING DATE: 2001-08-21 
PRIOR APPLICATION NUMBER: PCT/US01/26096 
PRIOR FILING DATE: 2001-08-21 
PRIOR APPLICATION NUMBER: US 60/279,281 
PRIOR FILING DATE: 2001-03-28 
PRIOR APPLICATION NUMBER: US 10/109,029 
PRIOR FILING DATE: 2002-03-28 
PRIOR APPLICATION NUMBER: PCT/US02/ 0 972 8 
PRIOR FILING DATE: 2002-03-28 
PRIOR APPLICATION NUMBER: US 60/290,288 
PRIOR FILING DATE: 2001-05-11 
PRIOR APPLICATION NUMBER: US (not assigned) 
PRIOR FILING DATE: 2002-05-13 
NUMBER OF SEQ ID NOS : 4 8 

SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 39 
LENGTH: 1630 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (230) . . . (1345) 
US-10-162-012-39 

Query Match 2.1%; Score 32.8; DB 4; Length 1630; 

Best Local Similarity 50.0%; Pred. No. 7.2; 

Matches 82; Conservative 0; Mismatches 82; Indels 0; Gaps 0; 

Qy 887 T GAAG GAAT GCAG G GT T CACT T CAAGAAGAAAGC AGT GT G CAGGT GT AC CAT CT C CC AGT 946 

II II I I I I I I I I I I I I I I I I I I I I I 111 

Db 617 TGGTGGCTGGCCGGGAGGACATCCAGAGGGAGAAGAGGCTGATGAGCATGCTGGCAAAGT 558 

Qy 947 CAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTT 1006 

I I I I I I II III I I II I I I I I I I II III I II I 

Db 557 CAGTGAGCAGGTGTGCTGCGTCAGTCATGACAGCCAAGCTGTGTGCCAGGTACCCACCAA 4 98 

Qy 1007 GT CAT TAT AC C T C CAAG GACAACAGAGT GGT AC AT AAGGC T AAA 1050 

I I I I I I I I I I I I I I II I I I I I I I 

Db 497 C GACT T C T C C GAT CAT GAACAAC AG GC AGAT GG C AGAG G C T AC A 454 



RESULT 22 



US-09-064-199-12/c 

; Sequence 12, Application US/09064199 
; Patent No. 6632604 

GENERAL INFORMATION: 
; APPLICANT: MACH, Bernard 

TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

THEIR USE, IN PARTICULAR AS DRUGS 
NUMBER OF SEQUENCES: 25 
; CORRESPONDENCE ADDRESS: 

; ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 

; STREET: P.O. Box 14 04 

; CITY: Alexandria 

STATE: Virginia 

COUNTRY: United States 
; ZIP: 22313-1404 

; COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Patentln Release #1.0, Version #1.30 

CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/ 09/064 , 199 

; FILING DATE: 22-Apr-1998 

; CLASSIFICATION: <Unknown> 

; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 
; FILING DATE: 22-APR-1997 

ATTORNEY/AGENT INFORMATION: 

NAME: Rea, Teresa Stanek 

REGISTRATION NUMBER: 30,427 

REFERENCE/ DOCKET NUMBER: 017753-096 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 836-6620 

TELEFAX: (703) 836-2021 
; INFORMATION FOR SEQ ID NO: 12: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4346 base pairs 
; TYPE: nucleic acid 

; STRANDEDNESS: single 

TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
; FEATURE: 

NAME/KEY: cllta of type II 
; SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

US-09-064-199-12 

Query Match 2.1%; Score 32.8; DB 4; Length 4346; 

Best Local Similarity 54.0%; Pred. No. 12; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 108 0 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I I I I I I I I I I I I I I I I I I I I I I I I II III I I I I I 

Db 367 6 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3617 



Qy 114 0 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I II I I I I I I I I I II II I I I I | | | || | 
Db 3616 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3557 

Qy 120 0 CCAG 12 03 

I I I I 

Db 3556 CCAG 3553 



RESULT 23 

US-09-064-199-14/c 

; Sequence 14, Application US/09064199 

; Patent No. 6632604 

; GENERAL INFORMATION: 

APPLICANT: MACH, Bernard 

TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 
; WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

THEIR USE, IN PARTICULAR AS DRUGS 
NUMBER OF SEQUENCES: 25 
; CORRESPONDENCE ADDRESS: 

ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 

STREET: P.O. Box 1404 

CITY: Alexandria 
; STATE: Virginia 

COUNTRY: United States 

ZIP: 22313-1404 
; COMPUTER READABLE FORM: 

; MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/09/ 0 64 , 199 

FILING DATE: 22-Apr-1998 
CLASSIFICATION: <Unknown> 
PRIOR APPLICATION DATA: 
; APPLICATION NUMBER: FR 97-04954 

; FILING DATE: 22-APR-1997 

ATTORNEY/AGENT INFORMATION: 
; NAME: Rea, Teresa Stanek 

REGISTRATION NUMBER: 30,427 
REFERENCE/DOCKET NUMBER: 017753-096 
TELECOMMUNICATION INFORMATION: 
; TELEPHONE: (703) 836-6620 

; TELEFAX: (703) 836-2021 

INFORMATION FOR SEQ ID NO: 14: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4366 base pairs 
TYPE: nucleic acid 
STRANDEDNESS: single 
; TOPOLOGY: linear 

MOLECULE TYPE: DNA (genomic) 
; FEATURE : 



NAME /KEY: cllta of type IV 
; SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

US-09-064-199-14 



Query Match 2.1%; Score 32.8; DB 4; Length 4366; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 1080 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I I I 1 I I 1 I I I I I I I I I II I I I I I I II III I I I I I 

Db 3696 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3637 

Qy 1140 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I I I I I I I I I I II II I I I I I M M I 

Db 3636 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3577 

Qy 1200 CCAG 1203 

I I I I 

Db 357 6 CCAG 3573 



RESULT 2 4 

US-09-064-199-13/C 

; Sequence 13, Application US/09064199 
; Patent No. 6632604 

GENERAL INFORMATION: 

APPLICANT: MACH, Bernard 
; TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

; EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

; THEIR USE, IN PARTICULAR AS DRUGS 

; NUMBER OF SEQUENCES: 25 

CORRESPONDENCE ADDRESS: 

ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 
; STREET: P.O. Box 1404 

; CITY: Alexandria 

; STATE : Virginia 

COUNTRY: United States 

ZIP: 22313-1404 
; COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
; COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 
; SOFTWARE: Patentln Release #1.0, Version #1.30 

CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/ 0 9/ 064 , 1 99 

; FILING DATE: 22-Apr-1998 

CLASSIFICATION: <Unknown> 
; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 

FILING DATE: 22-APR-1997 
ATTORNEY/AGENT INFORMATION: 
; NAME: Rea, Teresa Stanek 

REGISTRATION NUMBER: 30,427 
; REFERENCE/DOCKET NUMBER: 017753-096 



TELECOMMUNICATION INFORMATION: 
; TELEPHONE: (703) 836-6620 

TELEFAX: (703) 836-2021 
INFORMATION FOR SEQ ID NO: 13: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4418 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS: single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

NAME/ KEY: cllta of type III 
SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
US-09-064-199-13 

Query Match 2.1%; Score 32.8; DB 4; Length 4418; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 108 0 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 113 9 

I I I I I I I I I I I I I I I I I I I I I I I I II III Ml I I 

Db 3748 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3689 

Qy 114 0 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 119 9 

II I I II I I I I I I I I I II II I Ml III II I 

Db 368 8 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 362 9 

Qy 1200 CCAG 1203 

MM 

Db 3628 CCAG 3625 



RESULT 25 
US-09-064-199-8/c 

; Sequence 8, Application US/09064199 
; Patent No. 6632604 

GENERAL INFORMATION: 

APPLICANT: MACH, Bernard 
; TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

; WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

; EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

THEIR USE, IN PARTICULAR AS DRUGS 
NUMBER OF SEQUENCES: 25 
; CORRESPONDENCE ADDRESS: 

; ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 

STREET: P.O. Box 14 04 

CITY: Alexandria 
; STATE: Virginia 

; COUNTRY: United States 

ZIP: 22313-1404 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 



CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 09/ 064 , 19 9 
FILING DATE: 22-Apr-1998 
CLASSIFICATION: <Unknown> 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 
FILING DATE: 22-APR-1997 
ATTORNEY/AGENT INFORMATION: 

NAME: Rea, Teresa Stanek 
REGISTRATION NUMBER: 30,427 
REFERENCE/ DOCKET NUMBER: 017753-096 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 836-6620 
TELEFAX: (703) 836-2021 
INFORMATION FOR SEQ ID NO: 8: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4431 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

NAME/KEY: cllta de type II 
SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
US-09-064-199-8 

Query Match 2.1%; Score 32.8; DB 4; Length 4431; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 108 0 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

II I I I I I I I I I I I I I I I I I I I I I I II III I I I I I 

Db 37 61 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 37 02 

Qy 114 0 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I I I I I I I I I I II I I I I I I III II I 
Db 3701 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3642 

Qy 1200 CCAG 1203 

I I I I 

Db 3641 CCAG 3638 



RESULT 2 6 
US-09-641-999-2/C 

; Sequence 2 f Application US/09641999 

; Patent No. 6379894 

; GENERAL INFORMATION: 

; APPLICANT: MACH, BERNARD 

; TITLE OF INVENTION: METHOD FOR SCREENING COMPOUNDS CAPABLE OF INHIBITING 
; TITLE OF INVENTION: FIXING BETWEEN THE STAT1 T RANSCRIPT I ON FACTOR AND THE 
; TITLE OF INVENTION: USF1 TRANSCRIPTION FACTOR 

FILE REFERENCE: EGYP 3.3-007CONT 
; CURRENT APPLICATION NUMBER: US/ 09/ 64 1 , 999 
; CURRENT FILING DATE: 2000-08-18 
; NUMBER OF SEQ ID NOS : 6 
; SOFTWARE: Patentln Ver. 2.1 



; SEQ ID NO 2 

LENGTH: 4441 

TYPE: DNA 
; ORGANISM: Homo sapiens 
US-09-641-999-2 

Query Match 2.1%; Score 32.8; DB 4 ; Length 4441; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 108 0 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I I I I I II I I I I I I I I 1 I I I I I I I I II III I I I I I 

Db 3771 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3712 

Qy 114 0 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I I I I I I I I I I II II I I I I III II I 

Db 3711 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3652 

Qy 1200 CCAG 1203 

I I I I 

Db 3651 CCAG 3648 



RESULT 27 

US-09-064-199-10/c 

; Sequence 10, Application US/09064199 
; Patent No. 6632604 

GENERAL INFORMATION: 

APPLICANT: MACH, Bernard 
; TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

THEIR USE, IN PARTICULAR AS DRUGS 
; NUMBER OF SEQUENCES: 25 

CORRESPONDENCE ADDRESS: 
; ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 

STREET: P.O. Box 14 04 
; CITY: Alexandria 

; STATE: Virginia 

COUNTRY: United States 
ZIP: 22313-1404 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/ 09/ 064 , 19 9 

FILING DATE: 22-Apr-1998 
; CLASSIFICATION: <Unknown> 

; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 
FILING DATE: 22-APR-1997 
ATTORNEY/AGENT INFORMATION: 
; NAME: Rea, Teresa Stanek 



REGISTRATION NUMBER: 30,427 
REFERENCE/ DOCKET NUMBER: 017753-096 
TELECOMMUNICATION INFORMATION : 
TELEPHONE: (703) 836-6620 
TELEFAX: (703) 836-2021 
INFORMATION FOR SEQ ID NO: 10: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4441 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

NAME/KEY: cllta of type IV 
SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
US-09-064-199-10 

Query Match 2.1%; Score 32.8; DB 4; Length 4441; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 108 0 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I I I I I I I I I E I I I I 1 I I I I I I I I I II III I I I I I 

Db 3771 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3712 

Qy 114 0 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I I I I I I I I I I II II I I I I III II I 

Db 3711 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3652 

Qy 1200 CCAG 1203 

I I I I 

Db 3651 CCAG 3648 



RESULT 2 8 

US-08-519-547A-5/C 

Sequence 5, Application US/08519547A 
Patent No. 5994082 
GENERAL INFORMATION: 
APPLICANT: 

TITLE OF INVENTION: Proteins Essential for the Expression of 
TITLE OF INVENTION: Vertebrate MHC Class II Genes, DNA Sequences Encoding 



Same 



TITLE OF INVENTION: and Pharmaceutical Compositions 
NUMBER OF SEQUENCES: 6 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: FISH & NEAVE 

STREET: 1251 AVENUE OF THE AMERICAS 

CITY: NEW YORK 

STATE: NEW YORK 

COUNTRY: U.S.A. 

ZIP : 10020-1104 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: MS-DOS 

SOFTWARE: WordPerfect 6.1 



CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/519, 547A 
FILING DATE: 25-AUG-1995 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: EP94113378.7 
FILING DATE: 26-AUG-1994 
ATTORNEY/AGENT INFORMATION: 
NAME: HALEY, JAMES F. 
REGISTRATION NUMBER: 27,794 
REFERENCE/ DOCKET NUMBER: VOS-11 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-596-9000 
TELEFAX: 212-596-9090 
INFORMATION FOR SEQ ID NO: 5: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 4543 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
US-08-519-547A-5 

Query Match 2.1%; Score 32.8; DB 2; Length 4543; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 108 0 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 113 9 

I I I I I I I I I I I I I I I I I I I I I I I I II IN I I I I I 

Db 38 73 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3 814 

Qy 1140 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I II I I I 1 11 I II II I M I 

Db 3813 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3754 

Qy 1200 CCAG 1203 

I 11 I 

Db 3753 CCAG 3750 



RESULT 29 
US-09-064-199-9/c 

Sequence 9, Application US/09064199 
Patent No. 6632604 

GENERAL INFORMATION: 

APPLICANT: MACH, Bernard 

TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 



THE 

; 

AND 



EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 



THEIR USE, IN PARTICULAR AS DRUGS 
NUMBER OF SEQUENCES: 25 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 



STREET: P.O. Box 1404 
CITY: Alexandria 
STATE: Virginia 
COUNTRY: United States 
ZIP: 22313-1404 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 09/064 , 199 
FILING DATE: 22-Apr-1998 
CLAS S I FI CAT I ON : <Un known> 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 
FILING DATE: 22-APR-1997 
ATTORNEY/AGENT INFORMATION: 

NAME: Rea, Teresa Stanek 
REGISTRATION NUMBER: 30,427 
REFERENCE/DOCKET NUMBER: 017753-096 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 836-6620 
TELEFAX: (703) 836-2021 
INFORMATION FOR SEQ ID NO: 9: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4549 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

NAME/KEY: cllta of type III 
SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
US-09-064-199-9 

Query Match 2.1%; Score 32.8; DB 4; Length 4549; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 108 0 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

III I III I II I I I I I I I I I I I I I I M IN I I I I I 

Db 3879 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3820 

Qy 1140 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I I I I I I I I I I II II I I I I Ml M I 
Db 3819 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 37 60 

Qy 1200 CCAG 1203 

I I I I 

Db 3759 CCAG 3756 



RESULT 30 
US-09-064-199-2/c 

; Sequence 2, Application US/09064199 
; Patent No. 6632604 



; GENERAL INFORMATION: 

; APPLICANT: MACH, Bernard 

TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

; EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

THEIR USE, IN PARTICULAR AS DRUGS 
NUMBER OF SEQUENCES: 25 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 
STREET: P.O. Box 14 04 
; CITY: Alexandria 

; STATE: Virginia 

; COUNTRY: United States 

ZIP: 22313-1404 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
; CURRENT APPLICATION DATA: 

; APPLICATION NUMBER: US/09/064,199 

; FILING DATE: 22-Apr-1998 

CLASSIFICATION: <Unknown> 
; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 
FILING DATE: 2 2 -APR- 19 97 
ATTORNEY/AGENT INFORMATION: 

NAME: Rea, Teresa Stanek 
REGISTRATION NUMBER: 30,427 
REFERENCE/DOCKET NUMBER: 017753-096 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 836-6620 
TELEFAX : (703) 836-2021 
INFORMATION FOR SEQ ID NO: 2: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4564 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS: single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

NAME/KEY: cllta gene of type II 
; SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

US-09-064-199-2 



Query Match 2.1%; Score 32.8; DB 4 ; Length 4564; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 1080 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I I I I I I I I I II I I I I I I I I I I I I I II III I II I I 

Db 3894 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3835 



Qy 



114 0 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 
II I I I I I I I II I I I I II II I I II III II I 



Db 3 834 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3775 



Qy 1200 CCAG 1203 

I I I I 

Db 3774 CCAG 3771 



RESULT 31 

US-09-064-199-ll/c 

; Sequence 11, Application US/09064199 
; Patent No. 6632604 

GENERAL INFORMATION: 

APPLICANT: MACH, Bernard 

TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

THEIR USE, IN PARTICULAR AS DRUGS 
NUMBER OF SEQUENCES: 2 5 
; CORRESPONDENCE ADDRESS: 

ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 

STREET: P.O. Box 1404 
; CITY: Alexandria 

; STATE: Virginia 

COUNTRY: United States 
; ZIP: 22313-1404 

; COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
; COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 0 9/ 064 , 199 
FILING DATE: 22-Apr-1998 
CLASSIFICATION: <Unknown> 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 
; FILING DATE: 22-APR-1997 

; ATTORNEY/AGENT INFORMATION: 

; NAME: Rea, Teresa Stanek 

; REGISTRATION NUMBER: 30,427 

; REFERENCE/DOCKET NUMBER: 017753-096 

TELECOMMUNICATION INFORMATION: 
; TELEPHONE: (703) 836-6620 

TELEFAX: (703) 836-2021 
INFORMATION FOR SEQ ID NO: 11: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4649 base pairs 
; TYPE: nucleic acid 

; STRANDEDNESS : single 

TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

NAME/KEY: cllta of type I 
SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



Query Match 2.1%; Score 32.8; DB 4; Length 4649; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 1080 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I I I I I I I I I I I I I I I I I I I I I I I I II III III I I 

Db 3979 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 3920 

Qy 1140 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I I I I I I I I I I II II I I I I III II I 

Db 3919 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3860 

Qy 1200 CCAG 1203 

I I I I 

Db 3859 CCAG 3856 



RESULT 32 
US-09-064-199-7/c 

; Sequence 7 , Application US/09064199 

; Patent No. 6632604 

; GENERAL INFORMATION: 

APPLICANT: MACH, Bernard 
; TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

THEIR USE, IN PARTICULAR AS DRUGS 
NUMBER OF SEQUENCES: 25 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 

STREET: P.O. Box 1404 

CITY: Alexandria 
; STATE: Virginia 

COUNTRY: United States 

ZIP: 22313-1404 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 09/ 064 , 1 99 

FILING DATE: 22-Apr-1998 

CLASSIFICATION: <Unknown> 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 

FILING DATE: 22-APR-1997 
ATTORNEY/ AGENT INFORMATION: 
; NAME: Rea, Teresa Stanek 

REGISTRATION NUMBER: 30,427 

REFERENCE/DOCKET NUMBER: 017753-096 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 836-6620 

TELEFAX: (703) 836-2021 



INFORMATION FOR SEQ ID NO: 7: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4746 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

NAME/KEY: cllta of type I 
SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
US-09-064-199-7 

Query Match 2.1%; Score 32.8; DB 4; Length 4746; 

Best Local Similarity 54.0%; Pred. No. 13; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 1080 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I I I I I I I I I I I I I I I I I I I I I I I I II III I I I I I 

Db 4076 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 4017 

Qy 114 0 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 119 9 

II I I I I I I I I I I I I I II II I I I I 111 II I 

Db 4016 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 3957 

Qy 12 0 0 CCAG 1203 

I I I I 

Db 3956 CCAG 3953 



RESULT 33 
US-09-064-199-3/c 

; Sequence 3, Application US/09064199 

; Patent No. 6632604 

; GENERAL INFORMATION: 

; APPLICANT: MACH, Bernard 

TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 
; WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 

THE 

; EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES, 

AND 

; THEIR USE, IN PARTICULAR AS DRUGS 

; NUMBER OF SEQUENCES: 25 

CORRESPONDENCE ADDRESS: 

ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS, L.L.P. 

STREET: P.O. Box 14 04 

CITY: Alexandria 
; STATE: Virginia 

COUNTRY: United States 

ZIP : 22313-1404 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/ 09/ 064 , 199 

; FILING DATE: 22-Apr-1998 



CLASSIFICATION: <Unknown> 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 
FILING DATE: 22-APR-1997 
ATTORNEY/AGENT INFORMATION: 

NAME: Rea, Teresa Stanek 
REGISTRATION NUMBER: 30,427 
REFERENCE/ DOCKET NUMBER: 017753-096 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 836-6620 
TELEFAX: (703) 836-2021 
INFORMATION FOR SEQ ID NO: 3: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 5105 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

NAME/KEY: cllta gene of type IV 
SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
US-09-064-199-3 

Query Match 2.1%; Score 32.8; DB 4; Length 5105; 

Best Local Similarity 54.0%; Pred. No. 14; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 1080 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I 1 I I I I I I I I I I I I I I I I I I I I I I II III 11 I I I 

Db 4 4 35 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 4 37 6 

Qy 1140 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I I I I I I I I I I II II I I I I III II I 

Db 4375 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 4316 

Qy 1200 CCAG 1203 

II I I 

Db 4315 CCAG 4312 



RESULT 34 
US-09-064-199-l/c 

Sequence 1, Application US/09064199 
Patent No. 6632604 

GENERAL INFORMATION: 

APPLICANT: MACH, Bernard 

TITLE OF INVENTION: NUCLEIC ACID SEQUENCES OF CIITA GENES 

WHICH CAN BE INVOLVED IN CONTROLLING AND REGULATING 



THE 
AND 



EXPRESSION OF GENES ENCODING MHC TYPE II MOLECULES , 

THEIR USE, IN PARTICULAR AS DRUGS 
NUMBER OF SEQUENCES: 25 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BURNS, DOANE, SWECKER & MATHIS , L.L.P. 

STREET: P.O. Box 14 04 

CITY: Alexandria 



STATE: Virginia 
COUNTRY: United States 
ZIP: 22313-1404 
COMPUTER READABLE FORM: 
; MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/ 09/064 , 199 

; FILING DATE: 22-Apr-1998 

CLASSIFICATION: <Unknown> 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: FR 97-04954 
FILING DATE: 22-APR-1997 
ATTORNEY/AGENT INFORMATION: 

NAME: Rea, Teresa Stanek 
; REGISTRATION NUMBER: 30,427 

; REFERENCE/ DOCKET NUMBER: 017753-096 

TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 836-6620 
TELEFAX: (703) 836-2021 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 5463 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS: single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
; FEATURE: 

; NAME /KEY : cllta gene of type I 

; SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

US-09-064-199-1 



Query Match 2.1%; Score 32.8; DB 4; Length 5463; 

Best Local Similarity 54.0%; Pred. No. 14; 

Matches 67; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 

Qy 1080 TGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTT 1139 

I I I I I I I I I I I II I I I I I! I I I I I II 111 I I I I I 

Db 4793 TGGCAGGGGCCTGGGCAGGCAGAATGGGGCTGCCTCTGTCACTCCCTCTGGCCTGGCCGT 4734 



Qy 1140 GCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATG 1199 

II I I I I I I I I I 1 I I I II II I I I I III II I 
Db 4733 GGCTGTCCGCAATGTCCTTCAGAGAAGGCCTCGGGGCTCCTGGGTGGACCGCAGCTGCCA 4674 



Qy 1200 CCAG 1203 

I I 1 I 

Db 4673 CCAG 4670 



RESULT 35 

US-09-489-039A-5645 

; Sequence 5645, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 



; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.2004001 

; CURRENT APPLICATION NUMBER: US/ 09/4 8 9 , 03 9A 

; CURRENT FILING DATE: 2000-01-27 

; PRIOR APPLICATION NUMBER: US 60/117,747 

; PRIOR FILING DATE: 1999-01-29 

; NUMBER OF SEQ ID NOS : 14342 

; SEQ ID NO 5645 

; LENGTH: 64 8 

TYPE: DNA 
; ORGANISM: Klebsiella pneumoniae 
US-09-4 8 9-039A-5645 

Query Match 2.1%; Score 32.2; DB 4; Length 648; 

Best Local Similarity 56.0%; Pred. No. 6.6; 

Matches 61; Conservative 0; Mismatches 48; Inciels 0; Gaps 0; 

Qy 66 CCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGC 125 

II I I I I I I I II I I I I I I I I I I I I I I 1 I I I I I II 
Db 370 CCGCTGCTGCTGGCCGCCGGAGAGTTGCGCCGGCATCTGCTGCGCTTTTTCTTCCAGTCC 42 9 

Qy 12 6 TCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCC 174 

II III I I I I I I I I I I I I I I I I I II I III 
Db 430 GACCTGACGCAGCAGCGCCAGCGCCCGCGCCTGAGCCGCGCTTTTCCCC 478 



RESULT 36 

US-09-489-039A-5850/c 

; Sequence 5850, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.2004001 

; CURRENT APPLICATION NUMBER: US/09/489, 039A 

; CURRENT FILING DATE: 2000-01-27 

; PRIOR APPLICATION NUMBER: US 60/117,747 

PRIOR FILING DATE: 1999-01-29 
; NUMBER OF SEQ ID NOS: 14342 
; SEQ ID NO 5850 

LENGTH: 831 

TYPE: DNA 
; ORGANISM: Klebsiella pneumoniae 
US-09-4 8 9-039A-58 50 

Query Match 2.1%; Score 32.2; DB 4; Length 831; 

Best Local Similarity 56.0%; Pred. No. 7.6; 

Matches 61; Conservative 0; Mismatches 48; Indels 0; Gaps 0; 

Qy 66 CCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGC 125 

II I II II I I II I I I I I I I I I I I I I I I I I I I I II 
Db 513 CCGCTGCTGCTGGCCGCCGGAGAGTTGCGCCGGCATCTGCTGCGCTTTTTCTTCCAGTCC 454 



Qy 12 6 TCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCC 174 

II Ml I I I I I I I I I I I I I II I I I I I III 
Db 453 GACCTGACGCAGCAGCGCCAGCGCCCGCGCCTGAGCCGCGCTTTTCCCC 405 



RESULT 37 
US-09-845-583A-7 

; Sequence 7, Application US/09845583A 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: April 29, 2004, 17:06:46 ; 



Search time 1526.81 Seconds 

(without alignments) 

4651.434 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



US-09-989-9 81A-9_COPY_3436_5005 
1570 

1 cgaagcatcctgaagtacag ctagagagcaaacccagagc 1570 

IDENTITY__NUC 

Gapop 10.0 , Gapext 1.0 



5872368 



Searched: 2936184 seqs, 2261732022 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 50 summaries 

Database : Published_Applications_NA: * 

1: /cgn2_6/ptodata/2/pubpna/US07_PUBCOMB.seq:* 

2 : /cgn2_6/ptodata/2/pubpna/PCT_NEW_PUB . seq: 1 

3: /cgn2_6/ptodata/2/pubpna/US06_NEW_PUB. seq: * 

4: /cgn2_6/ptodata/2/pubpna/US06_PUBCOMB. seq: 1 

5: /cgn2_6/ptodata/2/pubpna/US07_NEW_PUB. seq: 1 

6: /cgn2_6/ptodata/2/pubpna/PCTUS_PUBCOMB. seq: 1 

7 : /cgn2_6/ptodata/2/pubpna/US08_NEW_PUB. seq: 1 

8 : /cgn2_6/ptodata/2/pubpna/US08_PUBCOMB. seq: 1 

9: /cgn2_6/ptodata/2/pubpna/US09A_PUBCOMB.seq: 1 

10: /cgn2_6/ptodata/2/pubpna/US09B_PUBCOMB. seq: 1 

11: /cgn2_6/ptodata/2/pubpna/US09C_PUBCOMB.seq: 1 

12: /cgn2_6/ptodata/2/pubpna/US09_NEW_PUB. seq: 1 

13: /cgn2_6/ptodata/2/pubpna/US09_NEW_PUB.seq2 : 1 

14 : /cgn2_6/ptodata/2/pubpna/US10A_PUBCOMB. seq: 1 

15: /cgn2_6/ptodata/2/pubpna/US10B_PUBCOMB. seq: 1 

16: /cgn2_6/ptodata/2/pubpna/US10C_PUBCOMB. seq: * 

17: /cgn2_6/ptodata/2/pubpna/US10_NEW_PUB. seq: 1 

18: /cgn2_6/ptodata/2/pubpna/US60_NEW_PUB. seq: 1 

19: /cgn2_6/ptodata/2/pubpna/US60_PUBCOMB.seq: 1 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 

US-09-989-981A-9 

Sequence 9, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT : Barnes , Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/ 09/ 9 8 9 , 98 1A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS: 13 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 9 
LENGTH: 6043 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: ABCG8 exon 2 (reverse strand) through ABCG5 exon 2 
OTHER INFORMATION: (forward strand) 
US-09-989-981A-9 

Query Match 99.9%; Score 1568; DB 10; Length 6043; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 157 0; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

1 I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I M I 
Db 3436 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 34 95 

Qy 61 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 12 0 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I M I I I I 1 I I I I I I I I I I I 
Db 3496 CATGACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTC 3555 

Qy 121 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I I I I I I 
Db 3556 TTTGCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATT 3615 

Qy 181 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 24 0 

I | | I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 3616 CCTYTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGG 3675 

Qy 241 AC ACT CT G G CT AAAG GT ACAT CAGAT AAT G GCAT C GT T GG C CAAAT T G GT GAACT GT TAT 300 

| | | | | I M I I M I I I I I I I I I I II M I I I I I I II II I I I I I I I I I I I I I I I I I I I I I M I 
Db 367 6 ACACTCTGGCTAAAGGT ACAT CAGAT AAT GGC AT CGTTGGCCAAATTGGT GAACT GTT AT 37 35 

Qy 301 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 360 

I I I I I M I I I I I II I I I I M I I I I II I I I II I I I I M II I I I I I I I 11 II I I I I M I I I I 
Db 37 36 CTCACGAGGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTT 37 95 



Qy 361 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 42 0 

I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3796 AAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAG 3855 

Qy 421 CC AT GG GT GAGC T GC C CT T T CT GAGT C CAG AGGGAG C C AGAGGGC C T C ACAT C AAC AGAG 4 80 

I I I I I II I I I I I I I I I I I I I I II I II I II I I I I I I I I I I I I I II I I I I I I I I M I I I I I I 
Db 3856 CCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAG 3915 

Qy 481 GGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAG 540 

I II I I I II I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 3916 GGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAG 3975 

Qy 541 GTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCT 600 

I I I I I I I II I I I I I I I I M I I I I I I II I I II I I I I I I I I II I I II I I I I I I I I I I I I I I I 

Db 3976 GT GT C CT GCAT GT GT C CT AC AGCGT CAGGTAAGGGGAC CT C CACAGCAAAAAGCT AGGCT 4035 

Qy 601 CTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGC 660 

II II I I I I I I I I I I II I I II I I I I I I I I I I 1 I I I I I I I I II I M II I I I I II I II I I I I I 

Db 4036 CTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGC 4 0 95 

Qy 661 AGAT C AGGGT GAAAGT GGACAGT CT GT AACAACAGT GAGT C GTT CCT CCT C CT C CT CCT G 72 0 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 4096 AGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTG 4155 

Qy 721 CGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCAC 78 0 

I I I I I I M I I I I I I I I I I I I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4156 CGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCAC 4215 

Qy 7 81 TGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGG 84 0 

I I II I I I I I I I I I I I I I II I I I I I I II I II I I I I I I I II I I I I I I II I I II II 1 I II I I I 
Db 4216 T GAT TTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGT CCT GTGT AGAT GGAGAAGG 4275 

Qy 841 CT C G GAGAGT G G GGGT G CT G GGGGCACAAAAT G GAAT GAAC AC T G C T GAAGGAAT G CAG G 900 

I I I I I II I I I I I I I I I I I I I I I 1 I I I I I II I I I I I II I I I I I II I I I I I I II II I I I I I I 
Db 4276 CTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCT GAAGGAAT GCAGG 4335 

Qy 9 01 GTT CACTT CAAGAAGAAAGCAGT GT GCAGGT GT ACCAT CT CCCAGT CAGAGAC CCAGTAA 960 

I I I II I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 4336 GTT C AC T T CAAGAAGAAAGCAGT GT G CAG GT GT ACCAT CT C C CAGT CAGAGAC C CAGT AA 4395 

Qy 961 TCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCC 1020 

I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I II II I I II I II M I I I I II II I I I I I I 
Db 43 96 TCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCC 4455 

Qy 1021 AAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACT 1080 

I II I II I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 4456 AAGGACAACAGAGTGGTACATAAGGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACT 4515 

Qy 1081 GGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTG 1140 

I I I I II I II I I I I I I II I I I I I I I I I I I I I I II I I II I II I II II I I I II I I I I I I I I I I 
Db 4516 GGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTG 457 5 

Qy 1141 CCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGC 12 00 

I I I I I I I I II I II I II I I 1 I I I II I I I II I I II II I I I I I I I I I I I I I I I I I II I I I I I I 
Db 4576 CCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGC 4635 



Qy 12 01 C AGC AGAAGT G G GACAGGCAAAT C C T CAAAGAT GT CT C CT T GT ACAT C G AGAGT G GC C AG 1260 

I | I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 463 6 CAGCAGAAGT GGGACAGGCAAAT CCT CAAAGAT GT CT CCTT GT ACAT CGAGAGT GGC CAG 4 695 

Qy 12 61 ATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTC 1320 

I I I I I 1 I I I I I I II I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 4 696 ATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSCSGGGGCTCCTGTACTTC 4755 

Qy 1321 TAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTA 1380 

I | | I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 475 6 TAAGGCAGGCTCTGGGAGGCTTTGGCTCYGTCTAAGCACAATGTTTAAGAAGTRAGTTTA 4 815 

Qy 1381 AGTTGTAGAGAGGCAGCCATGCATTTGGCATTTGAATACAATCTGGTGACTTGTCTGGCT 14 4 0 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I 
Db 4816 AGT T GT AGAGAG G CAG C CAT GC AT T T GGC AT T T GAAT ACAAT CT GGT GACT T GTCTGGCT 4 8 75 

Qy 1441 GC CAAT AGAAC C T AGT AC CAAAGT GAAAT C T T GAGGAAAAT C C CT GGAAAGAGT G GAAAG 1500 

I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I M I I I I I I I I I I 
Db 4 87 6 GCCAATAGAACCTAGT AC CAAAGT GAAAT CT T GAGGAAAAT CCCT GGAAAGAGT GGAAAG 4 935 

Qy 1501 TCCTGCCTAACACGTAAGTGCCTTCTTTGCTTGTTTGATTGACTGTGATGCTAGAGAGCA 1560 

I I I I I || I II II I I I I I I I II I I I I I I I I I I I I I i I II I I I I I I I I I I I I I I I I I I I 1 I I 
Db 4936 TCCTGCCTAACACGTAAGTGCCTTCTTTGCTTGTTTGATT GACT GT GAT GCTAGAGAGCA 4995 

Qy 1561 AACCCAGAGC 1570 

I I I I I I I I I I 
Db 4 99 6 AACCCAGAGC 5005 



RESULT 2 

US-09-989-981A-10 

Sequence 10, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0187 81-007320US 
CURRENT APPLICATION NUMBER: US/ 09/ 98 9 , 98 1A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 10 
LENGTH: 35 9 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: sequence between ABCG5 and ABCG8 containing 
OTHER INFORMATION: control sequences (bidirectional promoter) 



US-09-989-981A-10 



Query Match 22.8%; Score 358.6; DB 10; Length 359; 

Best Local Similarity 100.0%; Pred. No. le-110; 

Matches 359; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 64 GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTT 123 

I I I I I I I I I I I I I I I I II I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I i ! I I I I II 
Db 1 GACCAGTGCTGTTTGTGCCCTTTGTGTGGCCTCCCCTGCTGTTGGGCTCTCTCTGTCTTT 60 

Qy 124 GCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT 183 

I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 GCTCCTTAGAGCTGGGGCACCTGAGCCCTCCTCTGTGCCAGCCTTTCTCCCAGCATTCCT 12 0 

Qy 184 YTCTGGCAAACACTTCCTAT7WVCACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACA 243 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 YTCTGGCAAACACTTCCTATAAACACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACA 18 0 

Qy 244 CTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTC 303 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 181 CTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTC 24 0 

Qy 304 AC GAG GAT T C CAGG G C T GGGT AG GAT C GGAC AG GGC AC T C C CAT TGGCTCCT C AGT T AAA 363 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I 
Db 241 AC GAG GAT T C C AGG G C T GGGT AGGAT C G GAC AG G GC ACT C C CAT TGGCTCCT C AGT T AAA 300 

Qy 3 64 GCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCC 422 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 GCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCC 359 



RESULT 3 

US-10-104-047-825 

; Sequence 825, Application US/10104047 
; Publication No. US20030236392A1 
; GENERAL INFORMATION: 

; APPLICANT: HELIX RESEARCH INSTITUTE 

TITLE OF INVENTION: No. US20030236392Alel full length cDNA 

FILE REFERENCE: H1-A0105 
; CURRENT APPLICATION NUMBER: US/ 10/ 104 , 04 7 
; CURRENT FILING DATE: 2002-03-25 

PRIOR APPLICATION NUMBER: 
; PRIOR FILING DATE: 
; NUMBER OF SEQ ID NOS : 4 096 
; SOFTWARE: Patent In Ver. 2.1 
; SEQ ID NO 825 
; LENGTH: 2 512 
; TYPE : DNA 
; ORGANISM: Homo sapiens 
US-10-104-047-825 

Query Match 13.7%; Score 215; DB 16; Length 2512; 

Best Local Similarity 54.5%; Pred. No. 2.2e-61; 

Matches 576; Conservative 0; Mismatches 450; Indels 31; Gaps 6; 

Qy 237 AAGGACACTCTGGCTAAAGGTACATCAGATAATGGCATCGTTGGCCAAATTGGTGAACTG 296 

I II I I I I I I M I I II I I M I I I I I I I I I I I I I I II I I I I I I I I III 



Db 



1 AAGGACGCGCTGGCTAAAGGTACATCAGATAATGGTCTCCGTGGCCAAGTCCCATTCCTG 60 



Qy 2 97 T TAT CT C AC GAG GAT T C C AGGG CT GGGT AG GAT C GGACAG G G C ACT C C CAT T GGCT C C T C 356 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 61 CTGTCCCAAGGGACTCCGGGGTCAGGTGGAGCAGGCAGGGCAGTCTGCCACGGGCTCCCC 120 

Qy 357 AGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTG 416 

I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 

Db 121 AACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGCCCAGCTTTGCTGCCTG 180 

Qy 417 CT AGC C AT G GGT GAGC T GCCCTTTCT GAGT C CAGAG G GAG C C AGAGG G C CT C AC AT CAAC 476 

I I I I I I I I I I I I I I I I I I I I II I- II I M Ml II I I I I 
Db 181 TTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCATGGGTCTCCAAGTAAAC 2 40 

Qy 477 AGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGC 536 

I I I I I II I I II I II I I I I I I I I I I I III I I I M I II I I I I I I 
Db 241 AGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCCGGAGCCT CACAGC 297 

Qy 537 TTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTA 596 

III I I I I I M I I I I I I I I I I I I I I I II I I I I II III 
Db 2 98 CTGGGCATCCTCCATGCCTCCTACAGCGTCAGGTAAGGCAGAGCCC TTGCTG 34 9 

Qy 597 GGCTCTCTGATTGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTC 656 

II IE II I I I I I I I I I II 

Db 350 CTGCTGCTCCCCCAGGAGTGCGGGGCCCGGCGCTCACCCCTCTGCTGCCTTTCTTCACTC 4 09 

Qy 657 C AGCAGAT C AG GGT GAAAGTGGAC AGT CTGTAACAACAGT GAGT CGT T C CT C CTC CT CCT 716 

I I I I I I I I I I I I I I I I 

Db 410 TTTAAGTGCCAGTCTGGGCACTTCGGGCTC CCT CTTTAGTGGATC GGGT GGAGAGAGGAG 4 69 

Qy 717 CCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCGCTTGCTGCTTC 77 6 

I I I I I I I I I I I I I I I I II II III I I I I 

Db 470 AGGGAGAAGGGCTGTGCTGGGAAACATGGAGCGACAGTGAATGGCCCCTCCCCCTGCCCA 529 

Qy 777 TCACTGATTTCTGCTCTCCCCTTCCTTGACTC-GCCCACCACCTGTCCTGTGTAGATGGA 8 35 

I I I I II I I I I I I I I I I I I I I I 

Db 530 GGGAAGGGCCTGGGCATAAACAAAGTGGCAGCAGTGCCCTGCCAACCCAGTGTCTACGGC 589 

Qy 836 GAAGGCTC GGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTGCTGA 88 9 

Ml I I I I I II I I I I I I I III I I I I I I I I I I I 

Db 590 CTGCCCTCTGTGGATGGGAATGGGGGTACTGCGAATGCAAGGAGTCTTGAAACCTGGTGA 649 

Qy 890 AGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGTGTGCAGGTGTACCATCTCCCAGTCAG 949 

I I I I I I I I I I I II I I I I I II I I I I I 

Db 650 AAGAAT GCAGGG AC AGC C AC C T C G CAGC CAAAC G GAC AG GAC AT T C AGAGC AAC 7 03 

Qy 950 AGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAACTTGTC 1009 

II I I II I I I I I I I I I I I I I I I I I I 

Db 7 04 TCCAGCACAGGCCCCCTCCCTACGTGGCAGACAGCCTCAGTCGCTATCTGCCAGGTTCT- 7 62 

Qy 1010 AT TAT AC CTC C AAGGACAAC AGAGT G GT AC AT AAGG C T AAAAC AGAGT T GT CAAC CT GT C 1069 

I I I I I I I I I I I II I II I 

Db 7 63 AC AGAG GAG G GC G C AGAGACT GAAACAC GT TAG GAG CCTGTCCG GAGAC TACT G 816 

Qy 1070 CAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGGACCCTA 1129 

| M II II I I I I I I I I I I II I I II I I I I I I I II I I I I 

Db 817 GGGTGGGGCACAGGTAGGATCAATGCTGGGGACCTGGGTGTGGCCCCTTCCAGGGCCCCA 876 



1130 



CTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGTGGAACA 118 9 




Db 



877 



AGCTGCCTTTGCCTTCCTGGGGTTTCCTTTAAAGCCACCGCGTGAGGCCCTGGTGGGACA 936 



Qy 



1190 



T CAAAT CAT GC C AG C AGAAGT G GGAC AG GCAAAT C C T CAAAGAT GT C T C CT T GT AC AT C G 1249 




Db 



937 



TCACATCTTGCCGGCAGCAGT GGAC CAGGCAGATCCT CAAAGAT GTCTCCTTGTACGTGG 996 



Qy 



1250 



AGAGT GG C C AGAT T AT GT GC AT CT TAG G CAGC T CAG G 12 86 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
AGAGC G G GCAG AT CAT GT G CAT C C TAG GAAGC T CAG G 1033 



Db 



997 



RESULT 4 
US-09-837-992-2 

; Sequence 2, Application US/09837992 

; Patent No. US2002 008 1687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT : Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitos terolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 0187 8 1-00 602 0US 

; CURRENT APPLICATION NUMBER: US/ 09/ 837 , 992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

PRIOR FILING DATE: 2000-04-18 
; PRIOR APPLICATION NUMBER: US 60/204,234 
; PRIOR FILING DATE: 2000-05-15 
; NUMBER OF SEQ ID NOS : 45 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 2 

LENGTH: 2258 

TYPE: DNA 
; ORGANISM: Mus musculus 

FEATURE : 

OTHER INFORMATION: mouse sitos terolemia susceptibility gene (SSG) 
NAME/ KEY: CDS 
LOCATION: (47) . . (2005) 

OTHER INFORMATION: mouse sitos terolemia susceptibility gene (SSG) 
OTHER INFORMATION: protein 
US-09-837-992-2 

Query Match 12.2%; Score 191.4; DB 9; Length 2258; 

Best Local Similarity 97.0%; Pred. No. 2.2e-53; 

Matches 195; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

Qy 378 GGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCC 4 37 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2 GGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCC 61 

Qy 438 T T T C T GAGT C C AGAGGGAGC C AGAG G G C CT C ACAT C AACAGAG G GT CT CT GAG CT C C C T G 4 97 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 62 T T T C T GAGT C C AGAGGGAGC C AGAG G G C CT C AC AT C AACAGAG G GT CT CT GAG CT C C CT G 121 



Qy 4 98 GAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCC 557 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 I I I M I I I I I I I I I II I 
Db 122 GAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCC 181 

Qy 558 TACAGCGTCAGGTAAGGGGAC 578 

I I M I I I I I II I III 
Db 182 T AC AG C GT C AGC AAC C GT GT C 202 



RESULT 5 

US-09-989-981A-1 

Sequence 1, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0 187 8 1-007 32 OUS 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 1 
LENGTH: 1959 
TYPE: DNA 

ORGANISM: Mus mus cuius 
FEATURE: 
NAME/ KEY : CDS 
LOCATION: (1) . . (1959) 

OTHER INFORMATION: mouse ABCG5 (mABCG5) 
US-09-989-981A-1 

Query Match 9.3%; Score 146.4; DB 10; Length 1959; 

Best Local Similarity 96.2%; Pred. No. 4.2e-38; 

Matches 150; Conservative 0; Mismatches 6; Indels 0; Gaps 0 

Qy 423 ATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGG 482 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II II I I II I I I I I I I I I I I I I 
Db 1 AT G GGT GAG CT GCCCTTTCT GAGT C C AGAGGGAG C CAGAG GG C CT C AC AT CAACAGAGGG 60 

Qy 4 83 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 542 

I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I 
Db 61 TCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGT 120 



QY 
Db 



54 3 GT C CT G CAT GTGTCC TACAGCGTCAGGTAAGGGGAC 57 8 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
121 GTCCTGCATGTGTCCTACAGCGTCAGCAACCGTGTC 156 



RESULT 6 
US-09-837-992-7 

; Sequence 7, Application US/09837992 

; Patent No. US2002 0081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitos terolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 01 87 8 1- 00602 OUS 

; CURRENT APPLICATION NUMBER: US/ 09/ 837 , 992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

; PRIOR APPLICATION NUMBER: US 60/204,234 

; PRIOR FILING DATE: 2000-05-15 

; NUMBER OF SEQ ID NOS : 45 

; SOFTWARE: PatentlnVer. 2.1 

; SEQ ID NO 7 

LENGTH: 249 

TYPE: DNA 

ORGANISM: Homo sapiens 
; FEATURE : 

OTHER INFORMATION: exon 1 of hSSG 
US-09-837-992-7 

Query Match 6.5%; Score 101.6; DB 9; Length 249; 

Best Local Similarity 68.4%; Pred. No. 2.2e-23; 

Matches 156; Conservative 0; Mismatches 69; Indels 3; Gaps 1 

Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 400 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 25 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 8 4 

Qy 4 01 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 4 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II III 
Db 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

Qy 461 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 520 

Ml M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 14 5 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 2 04 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 2 05 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAG 24 9 



RESULT 7 
US-09-837-992-4 

; Sequence 4, Application US/09837992 

; Patent No. US2 002 0081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 



APPLICANT: Shan, Bei 
APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 
TITLE OF INVENTION: and Methods of Use 
FILE REFERENCE: 018781-006020US 
CURRENT APPLICATION NUMBER: US/09/837 , 992 
CURRENT FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: US 60/198,465 
PRIOR FILING DATE: 2000-04-18 
PRIOR APPLICATION NUMBER: US 60/204,234 
PRIOR FILING DATE: 2000-05-15 
NUMBER OF SEQ ID NOS : 45 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 4 
LENGTH: 2340 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human sitosterolemia gene (SSG) 
NAME/KEY: CDS 
LOCATION: (107) . . (2062) 

OTHER INFORMATION: human sitosterolemia susceptibility gene (SSG) 
OTHER INFORMATION: protein 
US-09-837-992-4 

Query Match 6.5%; Score 101.6; DB 9; Length 2340; 

Best Local Similarity 67.4%; Pred. No. 8e-23; 

Matches 159; Conservative 0; Mismatches 74; Indels 3; Gaps 1 

Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 400 

I I I 1 I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I 
Db 2 5 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 84 

Qy 4 01 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 4 60 

I I I II I I I I I I I I I II I I I I I I I I I I I I I I III II III 
Db 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 

Ml II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 145 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 57 6 

I I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I II I M 
Db 2 05 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 257 



RESULT 8 

US-09-989-981A-5 

Sequence 5, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 



TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 5 
LENGTH: 234 0 
TYPE : DNA 

ORGANISM: Homo sapiens 
FEATURE: 
NAME /KEY: CDS 
LOCATION: (107) . . (2062) 

OTHER INFORMATION: human ABCG5 (hABCGS) 
US-09-989-981A-5 

Query Match 6.5%; Score 101.6; DB 10; Length 2340; 

Best Local Similarity 67.4%; Pred. No. 8e-23; 

Matches 159; Conservative 0; Mismatches 74; Indels 3; Gaps 1 

Qy 341 CTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCACTAGAAAATTCAC 400 

I I I I I I I I I I I I I I I I I I 1 I II I I I I I I I I I I I I I I I I I I I I 
Db 2 5 CTGCCACGGGCTCCCCAACTGAAGCCACTCTGGGGAGGGTCCGGCCACCAGAAAATTTGC 84 

Qy 401 TTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAG 460 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II III 
Db 85 CCAGCTTTGCTGCCTGTTGGCCATGGGTGACCTCTCATCTTTGACCCCCGGAGGGTCCAT 144 

Qy 4 61 AGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTCGGTCACGGGCAC 52 0 

III II I I I I I I I I I I I I I II I I I I I I I I I I I I I I III I I I 
Db 14 5 GGGTCTCCAAGTAAACAGAGGCTCCCAGAGCTCCCTGGAGGGGGCTCCTGCCACCGCCCC 204 

Qy 521 AGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAGGTAAGGGG 576 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 205 GGAGCCT CACAGCCTGGGCATCCTCCATGCCTCCTACAGCGTCAGCCACCGCG 257 



RESULT 9 
US-09-837-992-8 

; Sequence 8, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

TITLE OF INVENTION: and Methods of Use 
; FILE REFERENCE: 0187 81-006020US 
; CURRENT APPLICATION NUMBER: US/ 09/ 8 37 , 992 
; CURRENT FILING DATE: 2001-04-18 
; PRIOR APPLICATION NUMBER: US 60/198,465 
; PRIOR FILING DATE: 2000-04-18 



PRIOR APPLICATION NUMBER: US 60/204,234 
; PRIOR FILING DATE: 2000-05-15 
; NUMBER OF SEQ ID NOS : 45 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 8 

LENGTH: 122 
; TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE : 

OTHER INFORMATION: exon 2 of hSSG 
US-09-837-992-8 

Query Match 5.7%; Score 90; DB 9; Length 122; 

Best Local Similarity 83.6%; Pred. No. 1.3e-19; 

Matches 102; Conservative 0; Mismatches 20; Indels 0; Gaps 0; 

Qy 1164 CAAC C GT GT CGGGCCTTGGT GGAAC AT CAAAT CAT G C C AG CAGAAGT GGGAC AG G CAAAT 1223 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 C C AC C GC GT GAGG C C C T G GT GGGACAT C AC AT C T T G C CGG CAG C AGT G GAC C AG G C AGAT 60 

Qy 1224 C CT CAAAG AT GTCTCCTT GT AC AT C GAGAGT GG C C AGAT T AT GT G CAT CT T AGG C AGCT C 1283 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 CCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGCAGATCATGTGCATCCTAGGAAGCTC 12 0 

Qy 1284 AG 1285 

I t 

Db 121 AG 122 



RESULT 10 

US-09-989-981A-3/c 

Sequence 3, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT : Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0187 8 1-007320US 
CURRENT APPLICATION NUMBER: US/ 0 9/ 9 8 9 , 98 1A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS: 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 3 
LENGTH: 2 019 
TYPE: DNA 

ORGANISM: Mus musculus 
FEATURE: 
NAME/ KEY : CDS 
LOCATION: (1) . . (2019) 



OTHER INFORMATION: mouse ABCG8 (mABCG8) 
US-09-989-981A-3 



Query Match 4.0%; Score 63; DB 10; Length 2019; 

Best Local Similarity 100.0%; Pred. No. le-09; 

Matches 63; Conservative 0; Mismatches 0; Indels 0; Gaps C 

Qy 1 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 60 

I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I M I I I I I I I I II I II I I I I I I I I I I I I 
Db 63 CGAAGCATCCTGAAGTACAGTCCCATTCCACAGCTGGGTCTCTTCTTTGGTTTTCTCAGC 4 

Qy 61 CAT 63 

I I I 

Db 3 CAT 1 



RESULT 11 
US-10-142-426-412 

Sequence 412, Application US/10142426 
Publication No. US20040048333A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT : DeForge , Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT: Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT: Goddard, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT: Tumas, Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT : Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C224 

CURRENT APPLICATION NUMBER: US/ 10/ 142 , 426 
CURRENT FILING DATE: 2002-05-09 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-142-426-412 

Query Match 2.5%; Score 39.2; DB 13; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 1 
Qy 4 49 AGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTC 508 



Db 32 0 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 37 9 

Qy 509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

: : | : : : : : | : : : I : : : : I II : : : : I I I I 

Db 380 SDAGAVKSKVAQLIVTASDETPCNPVPESYLIRLPHDCFQNATNSFYYDVGRCPVKTCAG 439 

Qy 569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG- 624 

: | : : | : : :|:| ::|: :| I |: :|: 

Db 440 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 499 

Qy 625 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 683 

| : | : : : : : : I I : I I : : : | : : : : : I : : : 

Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGTFTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 743 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 803 

: |::| :| I : : :| : :|: : :|: : 

Db 620 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 679 

Qy 804 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

Db 680 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 8 64 GCACAAAAT GGAAT GAACACT GCT GAAGGAAT GCAGGGTT CACTT CAAGAAGAAAGCAGT 923 

: :::::::|:: : I : : : : I : : | : : : : 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWI SVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 924 GT GCAG GT GT AC CAT C T C C CAGT C AGAGAC C C AGT AAT CAGAGC AGCT AAT G GGAGG C AT 98 3 

II I : I I : : : : : I : I : : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAIGVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: | : : :::: ::| :| : : I I : 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

Db 92 0 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 97 9 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

: | : : : : : I : : : : : : : : : I : : : : I : 

Db 98 0 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 

Qy 1164 CAAC CGTGTCGGGCCTTGGT GGAACAT CAAAT CAT GC CAG C AGAAGT G G GAC AG GC AAAT 1223 

: : : : : : : I : : : I : : : : : I : I : I : I 

Db 1040 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 12 
US-10-123-155-412 



Sequence 412, Application US/10123155 
Publication No. US20030068794A1 
GENERAL INFORMATION: 

APPLICANT: Baker, Kevin P. 

APPLICANT: Beresini, Maureen 

APPLICANT : DeForge, Laura 

APPLICANT : Desnoyers , Luc 

APPLICANT : Filvarof f , Ellen 

APPLICANT: Gao, Wei-Qiang 

APPLICANT: Gerritsen, Mary E. 

APPLICANT: Goddard, Audrey 

APPLICANT: Godowski , Paul J. 

APPLICANT: Gurney, Austin L. 

APPLICANT : Sherwood, Steven 

APPLICANT: Smith, Victoria 

APPLICANT: Stewart , Timothy A. 

APPLICANT : Tumas , Daniel 

APPLICANT: Watanabe, Colin K 

APPLICANT: Wood, William 

APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C30 

CURRENT APPLICATION NUMBER: US/ 10/ 123 , 155 
CURRENT FILING DATE: 2002-04-15 
Prior Application removed - See Palm or File Wrapper 

NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-123-155-412 



Query Match 2.5%; Score 39.2; DB 15; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 



Qy 


449 


Db 


320 


Qy 


509 


Db 


380 


Qy 


569 


Db 


440 


Qy 


625 


Db 


500 


Qy 


684 


Db 


560 



: : I I 



I : I I 



I I 



I I II 



624 



: I : I 



I I 



I I 



I : 



Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 803 

: | : : | : | | : : : I : : I : : : I : : 

Db 62 0 VKASWFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 67 9 

Qy 804 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 863 

Db 680 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 8 64 GCACAAAAT GGAAT GAACACT GCT GAAGGAAT GCAGGGTT CACTT CAAGAAGAAAGCAGT 92 3 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 924 GT GC AG GT GT AC CAT CT C C CAGT C AGAGAC C C AGT AAT CAGAG CAGC TAAT GG G AGGC AT 983 

I I I : I I : : : : : I : I : : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAIGVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: | : : : : : : : : I : I : : I I : 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

Db 92 0 DRYDYNTVPFNEDDPMSWTEDYIAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 979 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

: | : : : : : I : : : : : : : ::]::: : I : 

Db 980 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 

Qy 1164 CAAC C GT GT CGGGCCTTGGTG GAAC AT CAAAT CAT GC C AG CAGAAGT GGGAC AG GC AAAT 1223 

: : : : : : : I : : : I : : : : : I : I : I : I 

Db 1040 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 13 
US-10-146-731-412 

Sequence 412, Application US/10146731 
Publication No. US2 003 012 9 692A1 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Baker, Kevin P. 
Beresini, Maureen 
DeForge, Laura 
Desnoyers , Luc 
Filvarof f , Ellen 
Gao, Wei-Qiang 
Gerritsen, Mary E. 
Goddard, Audrey 
Godows ki , Paul J . 
Gurney, Austin L. 
Sherwood, Steven 
Smith, Victoria 
Stewart , Timothy A. 
Tumas, Daniel 
Watanabe, Colin K 



APPLICANT : Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C323 

CURRENT APPLICATION NUMBER: US/10/146,731 
CURRENT FILING DATE: 2002-05-15 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-146-731-412 

Query Match 2.5%; Score 39.2; DB 15; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 1; 

Qy 449 AGAGGGAGCCAGAGGGCCT CACAT CAACAGAGGGTCT CTGAGCT CCCT GGAGCAAGGTT C 5 08 

::|| ::: I :| |: : ::: ::::::: : 

Db 32 0 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 37 9 

Qy 509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

: : | : : : : : | : : : I : : : : I II : : —MM 

Db 380 SDAGAVKSKVAQLIVTASDETPCNPVPESYLIRLPHDCFQNATNSFYYDVGRCPVKTCAG 4 39 

Qy 569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 624 

: | : : | : : : I : I : : I : : I I M : I : 

Db 440 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 499 

Qy 625 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 683 

| : | : : : : : : | | : I I : : : | : : : : : I : : : 

Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGT FTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 7 43 

: : : : : : : : : : | 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 803 

: | : : | : I I : : : I : : i : : : I : : 

Db 620 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 679 

Qy 804 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

: : : : : : : : | : : : I : I : : :::::: : : : 

Db 680 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 8 64 GC ACAAAAT G GAAT GAAC ACT GC T GAAG GAAT GCAG G GT T CAC T T CAAGAAGAAAG C AGT 923 

: ::::::: | : : : I : : : : I : : | : : : : 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWI SVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 924 GT GC AGGT GT AC CAT C T C C C AGT C AGAG AC CCAGT AAT CAGAG CAGCT AAT G G GAGGC AT 983 

M I : I I:: : :: |: M : : : : 

Db 8 00 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAI GVPQPYLNKLNY 8 59 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: I : : : : : : : : | : I : Ml : 



Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 
104 4 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

920 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 97 9 
1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

98 0 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 

1164 CAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAAT 1223 

: : : : : : : I : : : I : : : : : I : I : I : I 

1040 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

1224 C 1224 
1100 S 1100 



RESULT 14 
US-10-140-472-412 

Sequence 412, Application US/10140472 
Publication No. US20030138 8 88A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT : DeForge , Laura 
APPLICANT : Desnoyers, Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godowski, Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT: Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C168 

CURRENT APPLICATION NUMBER: US/10/140, 472 
CURRENT FILING DATE: 2002-05-06 

Prior Apploication removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-140-472-412 



Query Match 2.5%; Score 39.2; DB 15; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 



Qy 44 9 AGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTC 5 08 

::|| ::: | :| |: : ::: ::::::: : 

Db 320 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 379 

Qy 509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

:: |:::::| : :: I : : : : I II :: = : I f I 1 

Db 380 S D AGAVK S KVAQ L I VT AS D ET P CN P VP ESYLIRLPHDC FQN AT N S F Y Y D VG RC P VKT C AG 439 

Qy 569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 624 

: I : : I : : : I : I : : I : : I I h : I : 

Db 440 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 499 

Qy 625 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 68 3 

| : | : : : : : : I I : I I : : : I : : : : : I : : : 

Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGTFTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 743 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNI I PLGEVVGEDPMAELEI PSRS FYRQNGEPYIGK 619 

Qy 7 44 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 8 03 

: | : : I : I I : : : I : : I : : : I : : 

Db 620 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 679 

Qy 804 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

Db 680 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 8 64 G C ACAAAAT G GAAT GAACAC T G C T GAAGGAAT G C AG GGT T CACT T CAAGAAGAAAG C AGT 923 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWI SVINLEPRTGFLSNPRAWGRFDSV 7 99 

Qy 924 GT GC AG GT GT AC C AT CT C C C AGT C AG AGAC C CAGT AAT CAGAGCAGCT AAT G G GAGGC AT 983 

I I I : I I : : : : : | : I : : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAIGVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: | : : :::: ::| : I : : I I : 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

Db 92 0 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 97 9 

Qy H04 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

:|: : : : :| : :: : : :: ::| ::: : I : 

Db 980 TVGKLYGI RDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVI PQGSCRRASVN 1039 

Qy 1164 CAAC CGTGTCGGGCCTTGGTG GAAC AT C AAAT C AT GC CAG C AGAAGT GGGAC AGG C AAAT 1223 

Db 1040 PMLHEYLWHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 15 
US-10-141-761-412 

Sequence 412, Application US/10141761 
Publication No. US20030148432A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini, Maureen 
APPLICANT: DeForge, Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT : Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT: Goddarcl, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT : Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Turnas , Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT: Wood, William 
APPLICANT : Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C198 

CURRENT APPLICATION NUMBER: US/ 1 0/ 14 1 , 761 
CURRENT FILING DATE: 2002-05-08 

Prior Application removed - See Palm or File Wrapper 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-141-761-412 

Query Match 2.5%; Score 39.2; DB 15; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 



QY 


449 


Db 


320 


Qy 


509 


Db 


380 


Qy 


569 


Db 


440 


Qy 


625 


Db 


500 



: I : I 



I I 



I I I 



: I : 



I : : 



Qy 



684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 743 



Db 



560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 



Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 8 03 

: |::| :| I : : :| : :|: : :|: : 

Db 620 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 679 

Qy 8 04 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

:::::::: I : : : I : I : : :::::: : : : 

Db 680 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 8 64 G CACAAAAT GGAAT GAAC AC T G C T GAAG GAAT G C AGG GT T C AC T T C AAGAAGAAAGC AGT 923 

: ::::::: | : : : | : : : : I : : I : : : : 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 92 4 GT G C AGGT GT AC CAT CT C C CAGT C AGAGAC C C AGT AAT CAG AGC AGCT AAT GG GAGG C AT 983 

| | I : I I : : : : : I : I : : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAI GVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

:]:::::: :: I : I : : I I : 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPI YAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCT AAAACAGAGT T GT CAACCT GT C C AGGGGCAACT GGGAT GGGGT AGGGCT GGGAGCA 1103 

Db 92 0 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 979 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

: | : : : : : | : : : : : : : : : I : : : : I : 

Db 98 0 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 

Qy 1164 CAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAAT 1223 

: : : : : : : I : : : I : : : : : I : I : I : I 

Db 104 0 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 16 
US-10-142-885-412 

Sequence 412, Application US/10142885 
Publication No. US20030157604A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT: Beresini , Maureen 
APPLICANT : DeForge , Laura 
APPLICANT: Desnoyers, Luc 
APPLICANT: Filvarof f , Ellen 
APPLICANT: Gao , Wei-Qiang 
APPLICANT: Ger ritsen, Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godowski, Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 



p 



; APPLICANT: Stewart , Timothy A. 

; APPLICANT: Tumas, Daniel 

; APPLICANT: Watanabe, Colin K 

; APPLICANT: Wood, William 

; APPLICANT: Zhang, Zemin 

; TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
; TITLE OF INVENTION: ACIDS ENCODING THE SAME 
; FILE REFERENCE: P3330R1C248 

; CURRENT APPLICATION NUMBER: US/ 10/ 142 , 8 8 5 
; CURRENT FILING DATE: 2002-05-10 

Prior Apploication removed - See File Wrapper or Palm 
; NUMBER OF SEQ ID NOS : 550 
; SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 
; ORGANISM: Homo Sapien 
US-10-142-885-412 



Query Match 2.5%; Score 39.2; DB 15; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 



Qy 


449 


AGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTC 


508 


Db 


320 


::| | ::: 1 :| I: : ::: ::::::: : 

RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 


379 


Qy 


509 


GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 

: : | : : : : : I : : : 1 : : : : 1 II = = : : 1 1 1 1 
S D AGAVK S KVAQ L I VT AS D ET P CN P VP E S Y L I RL P H D C FQN ATN S F Y Y D VG RC P VKT C AG 


568 


Db 


380 


439 


Qy 


569 


GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 


624 


Db 


440 


: | : : | : : : 1 : 1 : : I : : I 1 h : i : 

QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 


499 


Qy 


625 


-TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 


683 


Db 


500 


| : | : : : : : : | | : | I : : : I : : : : : 1 : : : 
DNGEPMRFGHVYMGNSRVSMTGYKGT FTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 


559 


Qy 


684 


CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 


743 


Db 


560 


KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 


619 


Qy 


744 


AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 


803 


Db 


620 


: | : : I : 1 1 : : : 1 : : 1 : : : 1 : : 
VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 


679 


Qy 


804 


GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 


863 


Db 


680 


VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 


739 


Qy 


864 


G C AC AAAAT G GAAT GAAC ACT G C T GAAG GAAT G C AGG GT T C ACT T CAAGAAGAAAGC AGT 


923 


Db 


740 


:: :::::|:: : |:: :: 1 : :l= = - 
RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 


799 


Qy 


924 


GT GCAGGT GTACCAT CT CC CAGT CAGAGACC CAGTAAT CAGAGCAGCT AAT GGGAGGCAT 


983 


Db 


800 


I i 1 : 1 1 :: : : : I : 1 : : = : : 
ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAIGVPQPYLNKLNY 


859 



Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

:::: : :: : : | : ::: | ::: II : 

Db 920 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 979 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

Db 980 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 

Qy 1164 CAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAAT 1223 

: : : : : : : I : : : | : : : : : | : I : I : I 

Db 1040 PMLHEYLWHLPLAWNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 17 
US-10-158-790-412 

Sequence 412, Application US/10158790 
Publication No. US2 0030180879A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Be res ini , Maureen 
APPLICANT : DeForge , Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT: Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT: Goddard, Audrey 
APPLICANT: Godows ki , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas, Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT : Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C448 

CURRENT APPLICATION NUMBER: US/10/158 , 790 
CURRENT FILING DATE: 2002-05-30 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-158-790-412 



Query Match 2.5%; Score 39.2; DB 15; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 1; 

Qy 449 AGAGG GAG C CAGAGG GC C T C AC AT CAAC AGAG G GT CT CT G AG CT C C C T G GAG C AAG GT T C 508 

: : | | : : : | : | | : : : : : ::::::: : 

Db 32 0 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 379 

Qy 509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

: : | : : : : : I : : : I : : : : t II : : : : I II I 

Db 380 S D AG AVK S KVAQ L I VT AS D ET P CN P VP ESYLIRLPHDC FQN ATN S F Y Y D VGRC P VKT C AG 439 

Qy 569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 624 

: I : : I : : :|:| ::|: :| I |: :|: 

Db 440 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 499 

Qy 625 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 68 3 

| : | : : : : : : I I : I I : : : I : : : : : I : : : 

Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGTFTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 559 

Qy 684 CTGTAACT^ACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 743 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 803 

: | : : | : I I : : : I : : I : : : I : : 

Db 62 0 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 67 9 

Qy 8 04 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

:::::::: I : : : I : I : : :::::: : : : 

Db 680 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 864 G CACAAAAT G GAAT GAAC ACT GCT GAAG GAAT G C AG G GT T C ACT T CAAGAAGAAAG C AGT 92 3 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 924 GT GC AG GT GT AC CAT CT C C CAGT C AG AGAC C C AGT AAT C AGAGC AGCT AAT G G GAG G CAT 983 

M | : I I : : : : : I : I : : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAIGVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: I : : : : : : : : I : I : : I I : 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

Db 92 0 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 97 9 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

Db 980 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 103 9 

Qy 1164 CAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAAT 1223 

Db 1040 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 



Db 



1100 S 1100 



RESULT 18 
US-10-137-871-412 

Sequence 412, Application US/10137871 
Publication No. US20030207350A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT: Beresini , Maureen 
APPLICANT: DeForge, Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT: Gao , Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godows ki , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT: Tumas, Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT : Wood, William 
APPLICANT : Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C153 

CURRENT APPLICATION NUMBER: US/ 10/ 137 , 87 1 
CURRENT FILING DATE: 2002-05-03 

Prior Application removed - See Palm or File Wrapper 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-137-871-412 

Query Match 2.5%; Score 39.2; DB 16; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



449 AGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTC 508 

: : M : : : I : I I : : : : : ::::::: : 

32 0 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 379 

509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

:: |:::::| : :: I : : : : I II :: ::l Mi 

380 SDAGAVKSKVAQLIVTASDETPCNPVPESYLIRLPHDCFQNATNSFYYDVGRCPVKTCAG 43 9 

569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 624 

: I : : | : : : I : I :: I : : I I h : I : 

44 0 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 4 99 

625 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 683 
I : |::: ::: || : I I :: : | ::: : : |:: : 



Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGT FTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 743 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 803 

: | : : | : I I : : : I : : I : : : I : : 

Db 620 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 679 

Qy 8 04 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 863 

: : : : : : : : | : : : | : | : : :::::: : : : 

Db 68 0 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 8 64 GC ACAAAAT GGAAT GAACACT GCT GAAGGAAT GCAGGGTT C ACT T CAAGAAGAAAGCAGT 923 

: ::::::: | : : : I : : : : I : : I : : : : 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWI SVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 924 GT GCAGGT GT AC CAT CT C C CAGT CAGAGACC CAGTAAT CAGAGCAGCTAAT GGGAGGCAT 983 

I I I : I I : : : : : I : I : : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVES3PKFNPNAI GVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: | : : :::: :: I : I : : I I : 

Db 8 60 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

: : : : : : : : : I : : : : I : : : II : 

Db 92 0 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 97 9 

Qy H04 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

: | : : : : : I : : : : : : : : : | : : : : I : 

Db 980 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 

Qy 1164 CAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAAT 1223 

: : : : : : : I : : : | : : : : : I : I : I : I 

Db 1040 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 19 
US-10-140-923-412 

Sequence 412, Application US/10140923 
Publication No. US20030207355A1 
GENERAL INFORMATION: 
APPLICANT : Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT : DeForge , Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT: Filvarof f , Ellen 
APPLICANT: Gao , Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godowski , Paul J. 



APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT: Tumas, Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT: Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C188 

CURRENT APPLICATION NUMBER: US/10/140, 923 
CURRENT FILING DATE: 2002-05-07 

Prior Application removed - See Palm or File Wrapper 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-140-923-412 

Query Match 2.5%; Score 39.2; DB 16; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 1; 

Qy 449 AGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTC 508 

: : | | : : : | : | | : : : : : ::::::: : 

Db 320 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 37 9 

Qy 509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

: : | : : : : : I : : : I : : : : I II : : : : I I I I 

Db 380 SDAGAVKSKVAQLIVTASDETPCNPVPESYLIRLPHDCFQNATNSFYYDVGRCPVKTCAG 439 

Qy 5 69 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 62 4 

: | : : | : : : I : I : : I : : I I I : : I : 

Db 440 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 499 

Qy 625 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 683 

Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGT FTLHVPQDTERLVLTFVDRLQKFVMTTKVLPFNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 743 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 8 03 

: | : : | : I I : : : I : : I : : : I : : 

Db 62 0 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 67 9 

Qy 804 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 863 

Db 680 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 864 G CACAAAAT GGAAT GAAC ACT GCT GAAG GAAT G C AGGGT T C ACT T CAAGAAGAAAG C AGT 923 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 799 



Qy 924 GT GCAGGT GTAC CAT CT CC CAGT CAGAGAC CCAGTAAT CAGAGCAGCTAAT GGGAGGCAT 983 

| | I : I |:: : :: |: h : : : : 

Db 800 ITGPNGACVPAFCDDQS PDAYSAYVLASLAGEELQAVES S PKFNPNAI GVPQP YLMKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: | : : :::: : : I : I : : I I : 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPI YAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 104 4 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

:::: : :: : : I : ::: I ::: II : 

Db 920 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 97 9 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

: | : : : : : I : : : : : : : : : | : : : : | : 

Db 980 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSGRRASVN 1039 

Qy 1164 CAACCGTGTCGGGCCTTGGTGGAACATCAAATCATGCCAGCAGAAGTGGGACAGGCAAAT 1223 

: : : : : : : I : : : I : : : : : I : I : I : I 

Db 1040 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 20 
US-10-141-756-412 

Sequence 412, Application US/10141756 
Publication No. US20030207359A1 
GENERAL INFORMATION: 
APPLICANT: Baker , Kevin P. 
APPLICANT: Beresini , Maureen 
APPLICANT: DeForge, Laura 
APPLICANT: Desnoyers , Luc 
APPLICANT : Filvaroff /Ellen 
APPLICANT : Gao, Wei-Qiang 
APPLICANT: Ger ritsen, Mary E. 
APPLICANT: Goddard, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT : Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe , Colin K 
APPLICANT: Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C200 

CURRENT APPLICATION NUMBER: US/ 10/ 14 1 , 756 
CURRENT FILING DATE: 2002-05-08 

Prior Apploication removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 



; ORGANISM: Homo Sapien 
US-10-141-756-412 

Query Match 2.5%; Score 39.2; DB 16; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 1; 

Qy 449 AGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTC 508 

: : | | : : : | : I I : : : : : ::::::: : 

Db 32 0 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 379 

Qy 509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

: : | : : : : : I : : : I : : : : I II : : : : I I I I 

Db 380 S DAGAVKS KVAQ L I VT AS D ET P CN P VP E S YL I RL PH DC FQNATN S F Y YDVGRC P VKT CAG 439 

Qy 569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 624 

: I : : I : : : I : I : : I : : I It: : I : 

Db 440 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVS7UV 499 

Qy 625 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 683 

Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGTFTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 743 

: : : : : : : : : : I 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 803 

: | : : | : | I : : : I : : I : : : I : : 

Db 62 0 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 679 

Qy 8 04 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

:::::::: I : : : I : I : : :::::: : : : 

Db 68 0 VHLDSTQVKMPEHI STVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEI RER 739 

Qy 8 64 GC AC AAAAT G GAAT GAAC AC T GCT GAAG GAAT G C AGG GT T C ACT T CAAGAAGAAAGCAGT 923 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 924 GT GC AGGT GT AC CAT CT C C CAGT C AGAG ACC C AGT AAT C AGAG CAGCT AAT G G GAGGC AT 983 

M I : I I : : : : : | : I : : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAI GVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: | : : :::: ::| :| : : I I : 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

Db 920 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 979 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

Db 980 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 

Qy 1164 CAAC CGTGTCGGGCCTTGGT GGAACAT CAAAT C AT GC C AG C AGAAGT GG GAC AG G CAAAT 1223 



Db 1040 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 



Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 21 
US-10-141-759-412 

Sequence 412, Application US/10141759 
Publication No. US20030207361A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT: DeForge , Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT: Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT: Goddard, Audrey 
APPLICANT: Godows ki , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe , Colin K 
APPLICANT : Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C197 

CURRENT APPLICATION NUMBER: US/ 10/ 14 1 , 75 9 
CURRENT FILING DATE: 2002-05-08 

Prior Apploication removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-141-759-412 

Query Match 2.5%; Score 39.2; DB 16; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 



1; 



Qy 

Db 

Qy 
Db 

Qy 

Db 



44 9 AGAG GGAG C C AGAG G GC CT C ACAT CAAC AGAGG GT C T CT GAGC T C C CT GG AGCAAG GT T C 508 
32 0 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 379 

509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

:: |:::::| : :: I : : : : I II :: ::llll 

380 SDAGAVKSKVAQLI VTASDETPCNPVPESYLIRLPHDCFQNATNSFYYDVGRCPVKTCAG 439 

569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 62 4 

: | : : I : : : I : I : : I : : I II: : I : 

44 0 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 499 



Qy 62 5 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 68 3 

| : | : : : : : : | | : | | : : : I : : : : : I : : : 

Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGTFTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 74 3 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 7 44 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 8 03 

: | : : I : I I : : : I : : I : : : I : : 

Db 62 0 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 67 9 

Qy 8 04 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

:::::::: I : : : I : I : : :::::: : : : 

Db 68 0 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 

Qy 8 64 GCACAAAAT GGAAT GAACACT GCT GAAGGAAT GCAGGGT T CACT T CAAGAAGAAAGCAGT 92 3 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 924 GT GCAGGT GTACCAT CT CCCAGT CAGAGAC C CAGTAAT CAGAGCAGCT AAT GGGAGGCAT 983 

|||:||:: : : : I : I : : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAIGVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 104 3 

: | : : : : : : : : i : I : : I I : 

Db 8 60 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

Db 92 0 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 97 9 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

Db 98 0 T VGKL YGI RDVRST RDRDQPNVSAACLEFKC S GMLYDQDRVDRT LVKVI PQGS CRRAS VN 1039 

Qy 1164 C AAC CGTGTCGGGCCTTGGTG GAACAT C AAAT CAT GC CAG C AGAAGT G G GAC AGGCAAAT 1223 

Db 104 0 PMLHEYLVNHLPLAWNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 22 
US-10-140-805-412 

Sequence 412, Application US/10140805 
Publication No. US20030207417A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT : DeForge , Laura 
APPLICANT: Desnoyers , Luc 
APPLICANT: Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 



APPLICANT: Gerritsen, Mary E. 
APPLICANT: Goddard, Audrey 
APPLICANT: Godows ki , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe , Colin K 
APPLICANT : Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C176 

CURRENT APPLICATION NUMBER: US/ 10/ 14 0 , 8 05 
CURRENT FILING DATE: 2002-05-07 

Prior Apploication removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 412 
LENGTH: 118 4 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-140-805-412 

Query Match 2.5%; Score 39.2; DB 16; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 1; 

Qy 449 AGAGG GAG C C AGAG G G C CT CAC AT C AAC AGAG G GT C T C T GAGCT C C CT G G AGCAAG GT T C 508 

: : | | : : : I : I I : : : : : ::::::: : 

Db 32 0 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 379 

Qy 509 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

: : | : : : : : I : : : I : : : : I II : : : : I I I I 

Db 380 SDAGAVKSKVAQLIVTASDETPCNPVPESYLIRLPHDCFQNATNSFYYDVGRCPVKTCAG 439 

Qy 569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 624 

: | : : I : : : I : I : : I : : I I h : I : 

Db 440 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 499 

Qy 62 5 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 683 

| : |::: ::: || : I I :: : | ::: : : |:: : 

Db 500 DNGEPMRFGHVYMGNSRVSMTGYKGTFTLHVPQDTERLVLTFVDRLQKFVNTTKVLPFNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 743 

: : : : : : : : : : | 

Db 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 744 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 803 

: | : : | : I I : : : I : : I : : : I : = 

D b 62 0 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 67 9 

g y 8 04 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

Db 680 VHLDSTQVKMPEHISTVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEIRER 739 



Qy 



864 GCACAAAATGGAATGAACACTGCTGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGT 923 



Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 7 99 

Qy 924 GT GCAGGT GT AC CAT CT C CCAGT CAGAGACC C AGTAAT CAGAGCAGCTAAT GGGAGGCAT 983 

I I I : I I:: : :: |: |: : : : : 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAI GVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

: | : : :::: ::| :| : :| | : 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qy 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

:::: : :: : : | : ::: | ::: | | : 

Db 920 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 979 

Qy 1104 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

: | : : : : : I : : : : : : : : : | : : : : | : 

Db 980 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 

Qy 1164 CAAC CGTGTCGGGCCTT GGT G GAAC AT CAAAT CAT GC C AG C AGAAGT G GG AC AGG C AAAT 1223 

: : : : : : : | : : : | : : : : : | : I : I : I 

Db 104 0 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 23 
US-10-140-864-412 

Sequence 412, Application US/10140864 
Publication No. US200302074 19A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT: Beresini , Maureen 
APPLICANT: DeForge , Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT : Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godowski, Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe , Colin K 
APPLICANT: Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C184 

CURRENT APPLICATION NUMBER: US/ 1 0/ 14 0 , 8 64 
CURRENT FILING DATE: 2002-05-07 

Prior Application removed - See Palm or File Wrapper 
NUMBER OF SEQ ID NOS : 550 



SEQ ID NO 412 
LENGTH: 1184 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-140-864-412 

Query Match 2.5%; Score 39.2; DB 16; Length 1184; 

Best Local Similarity 9.3%; Pred. No. 0.092; 

Matches 73; Conservative 219; Mismatches 484; Indels 5; Gaps 1; 

Qy 449 AGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTTC 508 

: : I I : : : I : I I : : : : : ::::::: : 

D b 320 RRAGQSVSLCCKATGKPRPDKYFWYHNDTLLDPSLYKHESKLVLRKLQQHQAGEYFCKAQ 37 9 

Qy 5 09 GGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCAG 568 

: : | : : : : : | : : : I : : : : I II : : - = 1111 

Db 380 S D AGAVK S KVAQ L I VT AS D ET P CN P VP E S Y L I RL P H D C FQN AT N S F Y Y D VG RC P VKT C AG 439 

Qy 569 GTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGG 624 

: | : : I : : : I : I :: I : : I I I : : I : 

Db 440 QQDNGIRCRDAVQNCCGISKTEEREIQCSGYTLPTKVAKECSCQRCTETRSIVRGRVSAA 499 

Qy 625 -TGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGT 683 

| : | : : : : : : I I : I I : : : I : : : : : I : : = 

Db 500 DNGEPMRFGHVYMGN S RVSMTG YKGT FTLHVPQDTERLVLT FVDRLQKFVNTTKVL P FNK 559 

Qy 684 CTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTA 7 43 

: : : : : : : : : : I 

D b 560 KGSAVFHEIKMLRRKEPITLEAMETNIIPLGEWGEDPMAELEIPSRSFYRQNGEPYIGK 619 

Qy 74 4 AAACATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTT 8 03 

: | : : I : I I : : : I : : I : : : I : : 

Db 620 VKASVTFLDPRNISTATAAQTDLNFINDEGDTFPLRTYGMFSVDFRDEVTSEPLNAGKVK 679 

Qy 804 GACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGG 8 63 

Db 68 0 VHLDSTQVKMPEHI STVKLWSLNPDTGLWEEEGDFKFENQRRNKREDRTFLVGNLEI RER 739 

Qy 8 64 GCACAAAATGGAATGAACACTGCTGAAGGAATGCAGGGTTCACTTCAAGAAGAAAGCAGT 923 

: ::::::: | : : : | : : : : I : : | : : : : 

Db 740 RLFNLDVPESRRCFVKVRAYRSERFLPSEQIQGWISVINLEPRTGFLSNPRAWGRFDSV 799 

Qy 924 GT GCAGGT GT AC CAT CT C C CAGT CAGAGACCCAGTAAT CAGAGCAGCTAAT GGGAGGCAT 9 83 

I I I : I I:: : :: |: |: : : : • 

Db 800 ITGPNGACVPAFCDDQSPDAYSAYVLASLAGEELQAVESSPKFNPNAIGVPQPYLNKLNY 859 

Qy 984 GCTCCTTGGGTGGTGGCCAACTTGTCATTATACCTCCAAGGACAACAGAGTGGTACATAA 1043 

Db 860 RRTDHEDPRVKKTAFQISMAKPRPNSAEESNGPIYAFENLRACEEAPPSAAHFRFYQIEG 919 

Qv 1044 GGCTAAAACAGAGTTGTCAACCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCA 1103 

:::: : :: : : | : : : : I I I 

Db 920 DRYDYNTVPFNEDDPMSWTEDYLAWWPKPMEFRACYIKVKIVGPLEVNVRSRNMGGTHRR 979 

Q y H04 GGGGTCTGGCACCTTCCAGGACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAG 1163 

D b 980 TVGKLYGIRDVRSTRDRDQPNVSAACLEFKCSGMLYDQDRVDRTLVKVIPQGSCRRASVN 1039 



Qy 1164 CAAC CGTGTCGGGCCTTG GT GGAAC AT C AAAT CAT GC CAG CAGAAGT G G GAC AGGCAAAT 1223 

: : : : : : : I : : : | : : : : : I : I : I : I 

Db 1040 PMLHEYLVNHLPLAVNNDTSEYTMLAPLDPLGHNYGIYTVTDQDPRTAKEIALGRCFDGT 1099 

Qy 1224 C 1224 

Db 1100 S 1100 



RESULT 2 4 
US-10-184-644-524 

Sequence 524, Application US/10184644 
Publication No. US20030044 930A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Chen, Jian 
APPLICANT : Desnoyers , Luc 
APPLICANT: Goddard, Audrey 
APPLICANT: Godows ki , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT: Pan, James 
APPLICANT: Smith, Victoria 
APPLICANT: Watanabe, Colin K. 
APPLICANT: Wood, William I. 
APPLICANT : Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3430R1C227 

CURRENT APPLICATION NUMBER: US/10/184 , 644 
CURRENT FILING DATE: 2002-06-28 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 612 
SEQ ID NO 524 
LENGTH: 686 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-184-644-524 

Query Match 2.4%; Score 38.2; DB 15; Length 686; 

Best Local Similarity 11.9%; Pred. No. 0.15; 

Matches 47; Conservative 108; Mismatches 241; Indels 0; Gaps 0; 

Qy 944 AGTCAGAGACCCAGTAATCAGAGCAGCTAATGGGAGGCATGCTCCTTGGGTGGTGGCCAA 1003 

| | : : : | | : : : : : : : : : : I I : : : : I : : : I : : 

Db 178 AL FCQQLWRMGMLGT RVL S LVL F YKAYH FWVFWAGAHWLVMT FWLVAQQ S D 1 1 D S T CHW 237 

Qy 1004 CTT GT CATTAT ACCT CCAAGGACAACAGAGT GGT ACATAAGGCTAAAACAGAGTT GT CAA 1063 

: : : | :::::: : : : : : I : I I : : I : 

Db 238 RLFNLLVGAVYILCYLSFWDSPSRNRMVTFYMVMLLENIILLLLATDFLQGASWTSLQTI 2 97 

Qy 1064 CCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGG 1123 

| : : : : : : : : : : I I : : : I I I I : : I : : : I : : : 
Db 298 AGVLSGFLIGSVSLVIYYSLLHPKSTDIWQGCLRKSCGIAGGDKTERRDSPRATDLAGKR 357 

Qy 1124 ACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGT 1183 



Db 



358 TESSGSCQGASYEPTILGKPPTPEQVPPEAGLGTQVAVEDSFLSHHHWLWVKLALKTGNV 417 



Qy 1184 GGAAC AT CAAAT CAT GC C AGC AGAAGT G GG ACAGGCAAAT C C T CAAAGAT GTCTCCTTGT 1243 

Db 418 SKINAAFGDNSPAYCPPAWGLSQQDYLQRKALSAQQELPSSSRDPSTLENSSAFEGVPKA 477 

Qy 1244 ACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSC 1303 

I I |:: |:| :|| : 

Db 478 EADPLETSSYVSFASDQQDEAPTQNPAATQGEGTPKEGADAVSGTQGKGTGGQQRGGEGQ 537 



Qy 1304 SGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGG 1339 

: : : : I : I : I : I I : : I : : I 
Db 538 QSSTLYFSATAEVATSSQQEGSPATLQTAHSGRRLG 573 



RESULT 25 
US-10-184-634-524 

Sequence 524, Application US/10184634 
Publication No. US2 0030068684A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Chen, Jian 
APPLICANT: Desnoyers , Luc 
APPLICANT: Goddard, Audrey 
APPLICANT: Godows ki , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT: Pan, James 
APPLICANT: Smith, Victoria 
APPLICANT: Watanabe, Colin K. 
APPLICANT: Wood, William I. 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3430R1C217 

CURRENT APPLICATION NUMBER: US/10/184, 634 
CURRENT FILING DATE: 2002-06-28 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 612 
SEQ ID NO 524 
LENGTH: 686 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-184-634-524 



Query Match 2.4%; Score 38.2; DB 15; Length 686; 

Best Local Similarity 11.9%; Pred. No. 0.15; 

Matches 47; Conservative 108; Mismatches 241; Indels 0; Gaps 0; 
Qy 944 AGT C AG AGAC C C AGT AAT C AGAG CAG C T AAT G G GAG GCAT GCTCCTTGGGTGGT G GC CAA 1003 



Db 178 ALFCQQLWRMGMLGTRVLSLVLFYKAYHFWVFWAGAHWLVMTFWLVAQQSDIIDSTCHW 237 



Qy 1004 CT T GT CAT TAT AC C T C C AAG GACAAC AGAGT G GT ACAT AAG G CT AAAAC AGAGT T GT CAA 1063 

: : : )::::::: : : : : | : | | : : I : 

Db 238 RLFNLLVGAVYILCYLSFWDSPSRNRMVTFYMVMLLENIILLLLATDFLQGASWTSLQTI 297 

Qy 1064 CCTGTCCAGGGGCAACTGGGATGGGGTAGGGCTGGGAGCAGGGGTCTGGCACCTTCCAGG 1123 



Db 


298 


AGVLSGFLIGSVSLVIYYSLLHPKSTDIWQGCLRKSCGIAGGDKTERRDSPRATDLAGKR 


357 


Qy 


1124 


ACCCTACTCTGCCTTTGCCCTTGTGGGATTTCCTTTAAAGCAACCGTGTCGGGCCTTGGT 


1183 


Db 


358 


:: | :: I 1 : | |: :: :: : :: :|| 

TESSGSCQGASYEPTILGKPPTPEQVPPEAGLGTQVAVEDSFLSHHHWLWVKLALKTGNV 


417 


Qy 


1184 


q GAAC AT CAAAT CAT G C CAGC AGAAGT G GGAC AGG CAAAT C CT CAAAGAT GTCTCCTT GT 


1243 


Db 


418 


SKINAAFGDNSPAYCPPAWGLSQQDYLQRKALSAQQELPSSSRDPSTLENSSAFEGVPKA 


477 


Ov 


1244 


ACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGCCTGGGGGGSC 

: : : : : : | | 1 I : : I : I : II : 
EADPLETSSYVSFASDQQDEAPTQNPAATQGEGTPKEGADAVSGTQGKGTGGQQRGGEGQ 


1303 


Db 


478 


537 


Qy 


1304 


SGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAGG 1339 




Db 


538 


: : : : I : 1 : I : I I : : I : : 1 

QS STLYFSATAEVATS SQQEGS PATLQTAHSGRRLG 573 





RESULT 2 6 

US-10-027-632-147012 

; Sequence 147012, Application US/10027632 

; Publication No. US20020198371A1 

; GENERAL INFORMATION: 

; APPLICANT: Wang, David G. 

; TITLE OF INVENTION: Identification and Mapping of Single Nucleotide 
; TITLE OF INVENTION: Polymorphisms in the Human Genome 
; FILE REFERENCE: 108827.129 

CURRENT APPLICATION NUMBER: US/ 1 0/ 02 7 , 632 
; CURRENT FILING DATE: 2002-04-30 
; PRIOR APPLICATION NUMBER: US 60/218,006 
; PRIOR FILING DATE: 2000-07-12 
; PRIOR APPLICATION NUMBER: US 60/198,676 
; PRIOR FILING DATE: 2000-04-20 
; PRIOR APPLICATION NUMBER: US 60/193,483 
; PRIOR FILING DATE: 2000-03-29 
; PRIOR APPLICATION NUMBER: US 60/185,218 
; PRIOR FILING DATE: 2000-02-24 
; PRIOR APPLICATION NUMBER: US 60/167,363 
; PRIOR FILING DATE: 1999-11-23 
; PRIOR APPLICATION NUMBER: US 60/156,358 
; PRIOR FILING DATE: 1999-09-28 
; PRIOR APPLICATION NUMBER: US 60/146,002 

PRIOR FILING DATE: 1999-08-09 
; NUMBER OF SEQ ID NOS : 325720 
; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 147012 

LENGTH: 7 44 

TYPE: DNA 

ORGANISM: Human 
US-10-027-632-147012 

Query Match 2.3%; Score 36.6; DB 13; Length 744; 

Best Local Similarity 58.9%; Pred. No. 0.54; 

Matches 63; Conservative 0; Mismatches 44; Indels 0; Gaps 



Qy 1396 G C CAT GCAT T T GGC AT T T GAAT ACAAT CT GGT GACT T GT CT G GC T GCC AAT AGAACC T AG 14 55 

III I I I I I I I I I I I I I III I I I I I I I I I II 

Db 4 64 GT CAAAT AT T T T AC AC T T AACTAC AT GC T GT CAC AT AAAAT AT CT C CAAAT AACT TCAAA 523 

Qy 1456 T AC C AAAGT G AAAT C T T GAG G AAAAT C C C T G G AAAG AG T G G AAAG T C 1502 

I I I I II II I I I II I I I I I I I I I I I I I I I I I I I I 

Db 524 TTCTAACCTGTAAC CAAAT GTGAAATCCCTGGGAAGACTGGAAAGTC 57 0 



RESULT 27 

US-10-02 7-632-14 7 012 

Sequence 147012, Application US/10027632 
Publication No. US20030204075A9 
GENERAL INFORMATION: 
APPLICANT: Wang, David G. 

TITLE OF INVENTION: Identification and Mapping of Single Nucleotide 
TITLE OF INVENTION: Polymorphisms in the Human Genome 
FILE REFERENCE: 108827.129 

CURRENT APPLICATION NUMBER: US/10/027 , 632 
CURRENT FILING DATE: 2002-04-30 
PRIOR APPLICATION NUMBER: US 60/218,006 
PRIOR FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: US 60/198,676 
PRIOR FILING DATE: 2000-04-20 
PRIOR APPLICATION NUMBER: US 60/193,483 
PRIOR FILING DATE: 2000-03-29 
PRIOR APPLICATION NUMBER: US 60/185,218 
PRIOR FILING DATE: 2000-02-24 
PRIOR APPLICATION NUMBER: US 60/167,363 
PRIOR FILING DATE: 1999-11-23 
PRIOR APPLICATION NUMBER: US 60/156,358 
PRIOR FILING DATE: 1999-09-28 
PRIOR APPLICATION NUMBER: US 60/146,002 
PRIOR FILING DATE: 1999-08-09 
NUMBER OF SEQ ID NOS : 325720 
SOFTWARE : FastSEQ for Windows Version 4.0 
SEQ ID NO 147012 
LENGTH: 744 
TYPE: DNA 
ORGANISM: Human 
US-10-027-632-147 012 

Query Match 2.3%; Score 36.6; DB 16; Length 744; 

Best Local Similarity 58.9%; Pred. No. 0.54; 

Matches 63; Conservative 0; Mismatches 44; Indels 0; Gaps 0; 

Qy 1396 GCCATGCATTTGGCATTTGAATACAATCTGGTGACTTGTCTGGCTGCCAATAGAACCTAG 1455 

III I II I I I I I I I I I I III I I I I I I I I I II 

Db 4 64 GT CAAAT ATTTTACACT TAACTACAT GCT GT CAC AT AAAAT AT CT C CAAATAACTT CAAA 523 

Qy 1456 T AC CAAAGT GAAAT C T T GAGGAAAAT C C CT GGAAAGAGT GGAAAGT C 1502 

I I I I Mill I I I I I I I I I I I I I I I I I I I I I I I I 

Db 524 TT CT AACCT GT AAC CAAAT GT GAAAT C CCT GGGAAGACT GGAAAGT C 57 0 



RESULT 2 8 

US-10-377-079-11/C 



Sequence 11, Application US/10377079 
Publication No. US20030236395A1 
GENERAL INFORMATION: 
APPLICANT: Huang, Shi 

TITLE OF INVENTION: PR-Domain Containing Nucleic Acids, Polypeptides, 
TITLE OF INVENTION: Antibodies and Methods 
FILE REFERENCE: P-LJ 3611 

CURRENT APPLICATION NUMBER: US/10/377 , 07 9 
CURRENT FILING DATE: 2003-02-28 
PRIOR APPLICATION NUMBER: US/ 09/ 38 9 , 956 . 
PRIOR FILING DATE: 1999-09-03 
NUMBER OF SEQ ID NOS: 93 
SOFTWARE: PatentlnVer. 2.0 
SEQ ID NO 11 
LENGTH: 2236 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (1) . . (1455) 
US-10-377-079-11 

Query Match 2.3%; Score 36.2; DB 16; Length 2236; 

Best Local Similarity 50.3%; Pred. No. 1.4; 

Matches 83; Conservative 2; Mismatches 80; Indels 0; Gaps 0; 

Qy 1174 GGGC CTT GGT GGAACAT CAAAT CAT GC CAGCAGAAGT GGGACAGGCAAAT C CT CAAAGAT 1233 

I I I I I I I I I I I I I I I III I I I I I I I I I I I I 

Db 713 GTGCCTTGCTGGATCCTCTGCGCCGCGCAGATGCCGTAGGCCAGGCCGGGCACAGTACTG 654 

Qy 1234 GTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGC 1293 

M I I I I I I I I II I II II I I I I I I I I II III 

Db 653 GTGCAGAGGCACACCTCGCGAGGCAGGTCCCGCAGCCACTCCGGCAGCTCCGGCGGGGGC 594 

Qy 1294 CTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAG 1338 

I I I I : : I II I I I I I I I I I I I I I I I I 

Db 593 GCGGCGGCCGCAGCGCTGCTGGTGCCCACAAGCCGGCGCAGCGAG 54 9 



RESULT 29 
US-10-377-079-9/c 

; Sequence 9, Application US/10377079 
; Publication No. US20030236395A1 
; GENERAL INFORMATION: 
; APPLICANT: Huang, Shi 

; TITLE OF INVENTION: PR-Domain Containing Nucleic Acids, Polypeptides, 
; TITLE OF INVENTION: Antibodies and Methods 
FILE REFERENCE: P-LJ 3611 

CURRENT APPLICATION NUMBER: US/ 1 0/37 7 , 07 9 
; CURRENT FILING DATE: 2003-02-28 
; PRIOR APPLICATION NUMBER: US/09/389,956. 
; PRIOR FILING DATE: 1999-09-03 
; NUMBER OF SEQ ID NOS: 93 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 9 

LENGTH: 24 88 

TYPE: DNA 



; ORGANISM: Homo sapiens 
; FEATURE : 

NAME/ KEY: CDS 

LOCATION: (1) . . (1707) 
US-10-377-079-9 

Query Match 2.3%; Score 36.2; DB 16; Length 2488; 

Best Local Similarity 50.3%; Pred. No. 1.5; 

Matches 83; Conservative 2; Mismatches 80; Indels 0; Gaps 0; 

Qy 1174 GGGCCTTGGTG GAAC AT CAAAT CAT G C C AG C AGAAGT G GGAC AG GCAAAT C C T CAAAGAT 1233 

I I I I I II I I I I I I I I III I I I I I I I I I I I I 

Db 713 GTGCCTTGCTGGATCCTCTGCGCCGCGCAGATGCCGTAGGCCAGGCCGGGCACAGTACTG 654 

Qy 1234 GTCTCCTTGTACATCGAGAGTGGCCAGATTATGTGCATCTTAGGCAGCTCAGGTAAGTGC 1293 

II I I I I I I I I I I I II II I I I I I I I I I I III 

Db 653 GTGCAGAGGCACACCTCGCGAGGCAGGTCCCGCAGCCACTCCGGCAGCTCCGGCGGGGGC 594 

Qy 1294 CTGGGGGGSCSGGGGCTCCTGTACTTCTAAGGCAGGCTCTGGGAG 133 8 

I I I I : : I I I I I I I I I I I I I I I I I 1 I 

Db 593 GCGGCGGCCGCAGCGCTGCTGGTGCCCACAAGCCGGCGCAGCGAG 549 



RESULT 30 

US-10-424-599-37000/c 

; Sequence 37000, Application US/10424599 
; Publication No. US20040031072A1 
; GENERAL INFORMATION: 

APPLICANT: La Rosa Thomas J 
; APPLICANT: Kovalic David K 
; APPLICANT: Zhou Yihua 
; APPLICANT: Cao Yongwei 

; TITLE OF INVENTION: Soy Nucleic Acid Molecules and Other Molecules Associated 
With 

TITLE OF INVENTION: Plants and Uses Thereof for Plant Improvement 
; FILE REFERENCE: 38-21 ( 53223 ) B 
; CURRENT APPLICATION NUMBER: US/10/424 , 599 
; CURRENT FILING DATE: 2003-04-28 
; NUMBER OF SEQ ID NOS : 285684 
; SEQ ID NO 37000 
; LENGTH: 456 
; TYPE: DNA 

ORGANISM: Glycine max 
FEATURE: 

OTHER INFORMATION: Clone ID: PAT_MRT3847_133412C . 1 
US-10-424-599-37000 



Query Match 2.3%; Score 36; DB 13; Length 456; 

Best Local Similarity 51.3%; Pred. No. 0.65; 

Matches 81; Conservative 1; Mismatches 76; Indels 0; Gaps 0; 

Qy 33 GCTGGGTCTCTTCTTTGGTTTTCTCAGCCATGACCAGTGCTGTTTGTGCCCTTTGTGTGG 92 

I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I 

Db 267 GCGGGGGCCCCCCTTTTGTTTTTTGCCGGTTTACCCCCGTTTTTTAACCCGCTTTTGGTG 208 

Qy 93 CCTCCCCTGCTGTTGGGCTCTCTCTGTCTTTGCTCCTTAGAGCTGGGGCACCTGAGCCCT 152 

II I I I I I I I I I II I I I I I I I I I I I I I I I 



Db 207 TGGGCCGGGGTCTTACCCATTTTTTTTGGGTTCCGGTTGGAACAAGGGCCCTTTTCCCCT 14 8 



Qy 153 CCTCTGTGCCAGCCTTTCTCCCAGCATTCCTYTCTGGC 190 

I I I I I I I I I I I I I I I I hill 
Db 14 7 TTTGTTTCCCTGCCTCTGTAATAGCCTTTTTCCCGGCC 110 



RESULT 31 

US-09-867-701-2637 

Sequence 2637, Application US/09867701 
Patent No. US2 0 02 01322 37A1 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 



Aglate, Paul A. 
Jones, Robert 
Harlocker, Susan L. 
TITLE OF INVENTION: COMPOSITIONS AND METHODS FOR THE THERAPY 
TITLE OF INVENTION: AND DIAGNOSIS OF OVARIAN CANCER 
FILE REFERENCE: 210121.497 

CURRENT APPLICATION NUMBER: US/ 0 9/ 8 67 , 701 
CURRENT FILING DATE: 2001-05-29 
NUMBER OF SEQ ID NOS : 10912 
SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 2637 
LENGTH: 2 57 
TYPE: DNA 

ORGANISM: Homo sapien 
US-09-867-701-2637 

Query Match 2.3%; Score 35.6; DB 9; Length 257; 

Best Local Similarity 58.5%; Pred. No. 0.64; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0 

Qy 780 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 839 

I I I I II II I I I I I I I I I I I I I II I I I II I II I I I I I I 
Db 12 CTGAAGCCGGGGCTCCCCCTGCCTGCCTCTCTCTCCTCCTCCCCTCTGGGAATTGGGCAG 71 

Qy 840 GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 885 

I II I I I I I I I I I I I I I I I I I I I I I I 
Db 72 CCCTGGGCAGTTGTACTCATGGGGGCTTAAGATGCAGCTACCTCAG 117 



RESULT 32 

US-09-880-107-893/c 

Sequence 893, Application US/09880107 
Patent No. US20020142981A1 
GENERAL INFORMATION: 
APPLICANT: Home, Darci T. 
APPLICANT: Vockley, Joseph G. 
APPLICANT: Scherf, Uwe 
APPLICANT: Gene Logic, Inc. 

TITLE OF INVENTION: Gene Expression Profiles in Liver Cancer 
FILE REFERENCE: 44921-5028-WO 
CURRENT APPLICATION NUMBER: US/09/880, 107 
CURRENT FILING DATE: 2001-06-14 
PRIOR APPLICATION NUMBER: US 60/211,379 
PRIOR FILING DATE: 2000-06-14 
PRIOR APPLICATION NUMBER: US 60/237,054 



; PRIOR FILING DATE: 2000-10-02 
; NUMBER OF SEQ ID NOS : 3950 
; SOFTWARE: Patent In Ver. 2.1 
; SEQ ID NO 893 

LENGTH: 330 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE : 

OTHER INFORMATION: Genbank Accession No. US20020142981A1 AA411813 
US-09-880-107-893 

Query Match 2.3%; Score 35.6; DB 9; Length 330; 

Best Local Similarity 58.5%; Pred. No. 0.74; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0 

Qy 780 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 839 

I I I I II II I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 282 CTGAAGCCGGGGCTCCCCCTGCCTGCCTCTCTCTCCTCCTCACCTCTGGGAATTGGGCAG 223 

Qy 84 0 GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 8 85 

I II I I I I I I I I I I I I II I I I I I I I I 
Db 222 CCCTGGGCAGTTGTACTCATGGGGGCTTAAGATGCAGCTACCTCAG 177 



RESULT 33 

US-09-867-7 01-10557/c 

; Sequence 10557, Application US/09867701 

; Patent No. US20020132237A1 

; GENERAL INFORMATION: 

; APPLICANT: Aglate, Paul A. 

; APPLICANT: Jones, Robert 

; APPLICANT: Harlocker, Susan L. 

; TITLE OF INVENTION: COMPOSITIONS AND METHODS FOR THE THERAPY 
; TITLE OF INVENTION: AND DIAGNOSIS OF OVARIAN CANCER 
; FILE REFERENCE: 210121.497 

; CURRENT APPLICATION NUMBER: US/ 0 9/ 8 67 , 7 01 

; CURRENT FILING DATE: 2001-05-29 

; NUMBER OF SEQ ID NOS: 10912 

; SOFTWARE: FastSEQ for Windows Version 4.0 

; SEQ ID NO 10557 

LENGTH: 44 0 

TYPE: DNA 
; ORGANISM: Homo sapien 
; FEATURE : 

NAME/KEY: misc_feature 

LOCATION: (1) . . . (440) 

OTHER INFORMATION: n = A, T, C or G 
US-09-867-701-10557 

Query Match 2.3%; Score 35.6; DB 9; Length 440; 

Best Local Similarity 58.5%; Pred. No. 0.88; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0 

Qy 7 80 CTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTGTGTAGATGGAGAAG 839 

I I I I II II I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 261 CTGAAGCCGGGGCTCCCCCTGCCTGCCTCTCTCTCCTCCTCCCCTCTGGGAATTGGGCAG 202 



Qy 84 0 GCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 8 85 

I II I I I I I I I I I I I I II III I I I I I 
Db 201 CCCTGGGCAGTTGTACTCATGGGGGCTTAAGATGCAGCTACCTCAG 156 



RESULT 34 
US-10-184-644-348 

Sequence 348, Application US/10184644 
Publication No. US20030044930A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Chen, Jian 
APPLICANT: Desnoyers, Luc 
APPLICANT: Goddard, Audrey 
APPLICANT: Godowski, Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT: Pan, James 
APPLICANT: Smith, Victoria 
APPLICANT: Watanabe, Colin K. 
APPLICANT: Wood, William I. 
APPLICANT : Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3430R1C227 

CURRENT APPLICATION NUMBER: US/ 1 0/ 184 , 64 4 
CURRENT FILING DATE: 2002-06-28 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 612 
SEQ ID NO 348 
LENGTH: 7 77 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-184-644-348 

Query Match 2.3%; Score 35.6; DB 15; Length 777; 

Best Local Similarity 9.4%; Pred. No. 1.2; 

Matches 41; Conservative 138; Mismatches 256; Indels 0; Gaps 0 

Qy 308 GGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTG 367 

Db 103 KKIYWPAAKERVELCKLAGKDANTECANFIRVLQPYNKTHIYVCGTGAFHPICGYIDLGV 162 

Qy 368 CCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGG 427 

: : : : | I : : : : | : : : : : | | | 

Db 163 YKEDIIFKLDTHNLESGRLKCPFDPQQPFASVMTDEYLYSGTASDFLGKDTAFTRSLGPT 222 

Qy 428 T GAG CTGCCCTTTCT GAGT C CAGAGG GAG C C AGAGGG C CT CAC AT C AAC AGAG G GT C T CT 4 87 

Db 223 HDHHYIRTDISEHYWLNGAKFIGTFFIPDTYNPDDDKIYFFFRESSQEGSTSDKTILSRV 282 

Qy 488 GAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCT 547 

I : : I : : I : : : I I : : : : I | : : : | : : : : : 

Db 283 GRVCKNDVGGQRSLINKWTTFLKARLICSIPGSDGADTYFDELQDIYLLPTRDERNPWY 342 

Qy 548 GCATGTGTCCTACAGCGTCAGGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGAT 607 

I : I I : : : | : | : : : : : : | : | : : : : : : | I I : I 

Db 343 GVFTTT SSI FKGSAVCVYSMADI RAVFNGP YAHKESADHRWVQYDGRI PYPRPGTCPSKT 402 



Qy 608 TGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAG 667 

Db 4 03 YDPLI KSTRDFPDDVI S FI KRHS VMYKSVYPVAGGPT FKRINVD YRLTQI WDHVIAEDG 4 62 

Qy 668 GGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGG 727 

Db 463 Q YDVMFLGT DI GTVLKWS I S KEKWNMEEWLEELQ I FKHS S 1 1 LNMELS LKQQQLYI GS 522 

Qy 72 8 CAGAGC CT GGACAT T 742 

: I : : I : I : 

Db 523 RDGLVQLSLHRCDTY 537 



RESULT 35 
US-10-184-634-348 

Sequence 348, Application US/10184634 
Publication No. US20030068684A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT: Chen, Jian 
APPLICANT : Desnoyers , Luc 
APPLICANT: Goddard, Audrey 
APPLICANT: Godowski, Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Pan, James 
APPLICANT: Smith, Victoria 
APPLICANT: Watanabe, Colin K. 
APPLICANT: Wood, William I. 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3430R1C217 

CURRENT APPLICATION NUMBER: US/ 1 0/ 18 4 , 634 
CURRENT FILING DATE: 2002-06-28 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 612 
SEQ ID NO 348 
LENGTH: 7 77 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-184-634-348 



Query Match 2.3%; Score 35.6; DB 15; Length 777; 

Best Local Similarity 9.4%; Pred. No. 1.2; 

Matches 41; Conservative 138; Mismatches 256; Indels 0; Gaps 0; 
Qy 30 8 GGATTCCAGGGCTGGGTAGGATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTG 367 

Db 103 KKIYWPAAKERVELCKLAGKDANTECANFIRVLQPYNKTHIYVCGTGAFHPICGYIDLGV 162 



Qy 368 CCCTGGAGCCGGACAGGCCACTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGG 427 

: : : : I I : : : : I : : : : : I | I 

Db 163 YKEDIIFKLDTHNLESGRLKCPFDPQQPFASVMTDEYLYSGTASDFLGKDTAFTRSLGPT 222 

Qy 42 8 TGAGCTGCCCTTTCTGAGTCCAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCT 4 87 



Db 



223 HDHHYIRTDISEHYWLNGAKFIGTFFIPDTYNPDDDKIYFFFRESSQEGSTSDKTILSRV 282 



Qy 488 GAGCTCCCTGGAGCAAGGTTCGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCT 54 7 

I : : I : : I : : : I I : : : : I I : : : I : : : : : 

Db 283 GRVCKNDVGGQRSLINKWTTFLKARLICSIPGSDGADTYFDELQDIYLLPTRDERNPWY 342 

Qy 548 GC AT GT GT CCT ACAGC GT C AGGT AAGGGGAC CT CCACAGCAAAAAGCT AGGCT CT CT GAT 607 

I : I I : : : | : | : : : : : : | : | : : : : : : | I I : I 

Db 343 GVFTTTSSIFKGSAVCVYSMADIRAVFNGPYAHKESAJDHRWVQYDGRIPYPRPGTCPSKT 402 

Qy 608 TGCCTTTTCTGAATGGGTGGGTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAG 667 

Db 403 YDPLIKSTRDFPDDVISFIKRHSVMYKSVYPVAGGPTFKRINVDYRLTQIWDHVIAEDG 462 

Qy 668 GGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGG 727 

: : : I I : : : : : it it t t i t i i \ t 

Db 463 QYDVMFLGTDI GTVLKWS I SKEKWNMEEWLEELQI FKHS S 1 1 LNMELS LKQQQLYI GS 522 

Qy 72 8 CAGAGCCTGGACATT 7 42 

: I : : I : I : 

Db 523 RDGLVQLSLHRCDTY 537 



RESULT 36 

US-10-087-192-1894/C 

; Sequence 1894, Application US/10087192 

/ Publication No. US2002018258 6A1 

; GENERAL INFORMATION: 

; APPLICANT: Morris, David W. 

; APPLICANT: Engelhard, Eric K. 

TITLE OF INVENTION: NOVEL COMPOSITIONS AND METHODS FOR 
; TITLE OF INVENTION: CANCER 
; FILE REFERENCE: 529452000122 
; CURRENT APPLICATION NUMBER: US/ 10/ 0 87 , 192 
; CURRENT FILING DATE: 2002-03-01 
; PRIOR APPLICATION NUMBER: US 09/747,377 
; PRIOR FILING DATE: 2000-12-22 
; PRIOR APPLICATION NUMBER: US 09/798,586 
; PRIOR FILING DATE: 2001-03-02 
; NUMBER OF SEQ ID NOS : 2 059 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 1894 

LENGTH: 35143 
; TYPE: DNA 
; ORGANISM: Homo sapiens 
US-10-087-192-1894 



Query Match 2.3%; Score 35.6; DB 13; Length 35143; 

Best Local Similarity 46.7%; Pred. No. 11; 

Matches 113; Conservative 0; Mismatches 129; Indels 0; Gaps 0; 

Qy 64 6 GGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGTAACAACAGTGAGTCGTTC 7 05 

I I I I I I I I III I I I I I I I I III II I 

Db 25274 GGC AGGCT GTTCTCTGGTTCCAACTACTTGCCC AC AGGATCTCTAAAGACCCAGGAATGG 25215 



Qy 706 CTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCG 765 

II II I II I I I I I I I I I I I I I II III I 



Db 25214 GGGCTATTGCCAGGGGTTAGAAGAGAACCAGGTCCCAAGGGCATGGTGGGCGGGCAGATG 25155 

Qy 766 CTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTG 825 

I I I I III I I I I I II I I I I I II II I I I I I I 

Db 25154 GTTCCAGAGCCTTAGAGATTCATAGGTTCTTCCTCCTCCACCAGCTGCTCCGAGGGCCTG 25095 

Qy 826 TGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCACAAAATGGAATGAACACTG 885 

II I I I I I I I I I I I I I I I I I I I I I I I II III I 

Db 25094 TGGGGAGGGACAAGGGTGGGATGCTGGAGCACCAGGGCTGCAGCAAGGGCCTTAGCTAAG 25035 

Qy 886 CT 887 

I I 

Db 25034 CT 25033 



RESULT 37 

US-09-563-72 8A-36/c 

; Sequence 36, Application US/09563728A 

; Publication No. US20030078216A1 

; GENERAL INFORMATION: 

; APPLICANT: MacLeod, Alan R 

; APPLICANT: Li, Zoumei 

; APPLICANT: Besterman, Jeffrey M 

; TITLE OF INVENTION: Inhibition of Histone Deaceylase 
; FILE REFERENCE: 106101.229 

; CURRENT APPLICATION NUMBER: US/ 0 9/563 , 72 8A 

CURRENT FILING DATE: 2000-05-03 
; PRIOR APPLICATION NUMBER: 60/132,287 

PRIOR FILING DATE: 1999-05-03 
; NUMBER OF SEQ ID NOS : 36 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 36 

LENGTH: 122186 
; TYPE: DNA 

ORGANISM: Homo sapiens 
US-09-563-728A-36 



Query Match 2.3%; Score 35.6; DB 10; Length 122186; 

Best Local Similarity 46.7%; Pred. No. 22; 

Matches 113; Conservative 0; Mismatches 129; Indels 0; Gaps 0; 

Qy 646 GGT T GT CT GT C CAGC AGAT CAGGGT GAAAGT GGACAGT CT GT AACAACAGT GAGT C GTT C 705 

II I I I II I III I I I I I I I I III II I 

Db 107424 GGCAGGCTGTTCTCTGGTTCCAACTACTTGCCCACAGGATCTCTAAAGACCCAGGAATGG 

107365 



Qy 706 CTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAACATGCCCTGCCTGAAGCCG 765 

II II I II I I I I I I I I I I I I I II III I 

Db 1073 64 GGGCTATTGCCAGGGGTTAGAAGAGAACCAGGTCCCAAGGGCATGGTGGGCGGGCAGATG 

107305 



Qy 766 CTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACTCGCCCACCACCTGTCCTG 825 

I I I I III I I I I I II I I I I I II II I I I I I I 

Db 1073 04 GTTCCAGAGCCTTAGAGATTCATAGGTTCTTCCTCCTCCACCAGCTGCTCCGAGGGCCTG 

107245 



Qy 



82 6 T GT AGAT G GAGAAG GCT C G GAGAGT GGGGGTGCTGGG G GCAC AAAAT G GAAT GAACACT G 8 85 



Db 10724 4 TGGGGAGGGACAAGGGTGGGATGCTGGAGCACCAGGGCTGCAGCAAGGGCCTTAGCTAAG 

107185 

Qy 886 CT 887 

I I 

Db 107184 CT 107183 



RESULT 38 
US-10-142-426-358 

Sequence 358, Application US/10142426 
Publication No. US20040048333A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT : DeForge , Laura 
APPLICANT: Desnoyers , Luc 
APPLICANT: Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT: Goddard, Audrey 
APPLICANT: Godows ki , Paul J. 
APPLICANT : Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT: Tumas, Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT: Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C224 

CURRENT APPLICATION NUMBER: US/10/142, 426 
CURRENT FILING DATE: 2002-05-09 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 1049 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-142-426-358 



Query Match 2.3%; Score 35.4; DB 13; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0 



Qy 

Db 

Qy 

Db 



208 ACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACACTCTGGCTAAAGGTACATCAGATA 267 

1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 
268 ATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGG 327 



I 



61 GGI PTNTTNLTLTINHI PDI S PAS FHRLDHLVEI DFRCNCVP I PLGS KNNMCI KRLQI KP 120 



Qy 



32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 3 87 



Db 



121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 18 0 



Qy 38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 

Qy 448 C AGAG GGAG C C AGAGG G C C T CAC AT CAAC AGAG GGT C T C T GAG CT C C CT GG AGC AAG GT T 507 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDAXiTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :: : : : :|:: :: : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 62 8 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 68 7 

Db 421 KRLKVIDLSVNKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 688 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

I: ::: : ::| I : : ::::: :| : : 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 74 8 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

: : : : : :| :: : : :: : |:: | :: : 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 808 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

: : : | : : : : : : : : : : | : : : : : 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 868 AAAAT G GAAT GAAC AC T G C T GAAG GAAT G C AG G G T T C 904 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 39 
US-10-123-155-358 

Sequence 358, Application US/10123155 
Publication No. US20030068794A1 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Baker, Kevin P. 
Beresini, Maureen 
DeForge, Laura 
Desnoyers , Luc 
Filvarof f , Ellen 
Gao, Wei-Qiang 
Gerritsen,Mary E. 
Goddard, Audrey 
Godowski, Paul J. 
Gurney, Austin L. 
Sherwood, Steven 
Smith, Victoria 



; APPLICANT: Stewart , Timothy A. 

; APPLICANT: Tumas , Daniel 

; APPLICANT: Watanabe, Colin K 

; APPLICANT: Wood, William 

; APPLICANT: Zhang, Zemin 

; TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 

TITLE OF INVENTION: ACIDS ENCODING THE SAME 
; FILE REFERENCE: P3330R1C30 

; CURRENT APPLICATION NUMBER: US/10/123,155 
; CURRENT FILING DATE: 2002-04-15 

; Prior Application removed - See Palm or File Wrapper 
; NUMBER OF SEQ ID NOS : 550 
; SEQ ID NO 358 
; LENGTH: 1049 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-123-155-358 

Query Match 2.3%; Score 35.4; DB 15; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0 

Qy 208 ACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACACTCTGGCTAAAGGTACATCAGATA 2 67 

Db 1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

Qy 268 ATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGG 327 

I I :::::::::: | : : | : : : 

Db 61 GGIPTNTTNLTLTINHIPDISPAS FHRLDHLVEIDFRCNCVPI PLGSKNNMCIKRLQIKP 120 

Qy 32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

: : I : : : : : : : 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

Db 181 QNCYYRNPCWSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 

Qy 44 8 C AGAGG GAGC C AGAGGGC C T C AC AT CAACAGAG G GT CT CT GAG C T CC C T G GAG CAAG GT T 5 07 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

: : : : : : : I : : : : : : : : : : : 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 5 68 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :::::: I : : : : : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 62 8 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 68 7 

Db 421 KRLKVIDLSVNKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 68 8 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 7 47 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 



Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 808 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 868 AAAAT G GAAT GAAC ACT GC T GAAGGAAT GC AG G GT T C 904 

: : : : : : I : : : : : I 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 4 0 
US-10-146-731-358 

Sequence 358, Application US/10146731 
Publication No. US20030129692A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini, Maureen 
APPLICANT : DeForge, Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen,Mary E. 
APPLICANT: Goddard, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe , Colin K 
APPLICANT : Wood, William 
APPLICANT : Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C323 

CURRENT APPLICATION NUMBER: US/ 10/ 14 6, 731 
CURRENT FILING DATE: 2002-05-15 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 1049 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-146-731-358 



Query Match 2.3%; Score 35.4; DB 15; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 



Db 



208 AC ACC GT GT GT T CT G C C TAT T GT C GAGAT AAG GAC AC T C T G G CT AAAGGT AC AT C AG AT A 267 



I 



1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 



Qy 268 ATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGG 327 

I I :::::::::: | : : | : : : 

Db 61 GGIPTNTTNLTLTINHIPDISPAS FHRLDHLVEIDFRCNCVPIPLGSKNNMCIKRLQIKP 120 

Qy 32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

: : : : : I : : : : I : | : : | : : : 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 

Qy 44 8 C AGAGGGAGC C AGAGG G C CT C ACAT CAAC AGAG GGT C T CT GAG CT C CC T G GAGC AAGGT T 507 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 50 8 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :: : : : :|:: :: : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 62 8 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

Db 421 KRLKVIDLSVNKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 688 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

I ::::::: I I : : : : : : : : | : : 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 74 8 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

Db 541 1AELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 808 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

Db 601 LMMNDNDI S S ST SRTMES ESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDI S KN 660 

Qy 868 AAAAT G GAAT GAAC ACT G C T GAAGGAAT G C AGG GT T C 904 

: : : : : : I : :: : : I 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 

RESULT 41 
US-10-140-472-358 

Sequence 358, Application US/10140472 
Publication No. US20030138 88 8A1 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Baker, Kevin P. 
Be resini, Maureen 
DeForge, Laura 
Desnoyers , Luc 
Filvarof f , Ellen 
Gao, Wei-Qiang 
Gerritsen, Mary E. 



APPLICANT : Goddard, Audrey 
APPLICANT: Godowski, Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT : Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C168 

CURRENT APPLICATION NUMBER: US/ 10/ 140 , 472 
CURRENT FILING DATE: 2002-05-06 

Prior Apploication removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 1049 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-140-472-358 



Query Match 2.3%; Score 35.4; DB 15; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0; 

Qy 208 AC AC C GT GT GT T CT GC C TAT T GT C GAGAT AAG GAC ACT CT GGC T AAAGGT AC AT C AGAT A 267 

Db 1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

Qy 268 AT GGC AT C GT T G G C C AAAT T GGT GAACT GT T AT CT CAC GAG GAT T C CAG G G CT G GGT AGG 327 

| | :::::::::: | : : | : : : 

Db 61 GGI PTNTTNLTLTINHI PDI SPAS FHRLDHLVEIDFRCNCVPI PLGSKNNMCIKRLQIKP 120 

Qy 32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 4 47 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 

Qy 448 C AGAGG GAG C C AGAG GGC CT CAC AT CAAC AGAG GGT C T C T GAG C T C C CT G GAG C AAG GT T 507 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 5 08 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 5 67 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :: : : : :|:: :: : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 68 7 



Db 421 KRLKVIDLSWKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 688 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

I: ::: : ::| I : : ::::: :| : : 

Db 481 EASFMSWESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

: : : : : : | : : : : : : : | : : | : : : 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 808 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 867 

: : : | : : : : : : : : : : | : : : : : 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 8 68 AAAAT G GAAT GAAC AC T GCT GAAGGAAT G C AG G GT T C 904 

: : : : : : | : : : : : | 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 42 
US-10-141-761-358 

Sequence 358, Application US/10141761 
Publication No. US20030148432A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT : DeForge, Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT : Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C198 

CURRENT APPLICATION NUMBER: US/ 10/ 14 1 , 7 61 
CURRENT FILING DATE: 2002-05-08 

Prior Application removed - See Palm or File Wrapper 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 104 9 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-141-761-358 



Query Match 2.3%; Score 35.4; DB 15; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0; 



Qy 208 ACAC CGT GT GTT CTGC CTATT GT C GAGATAAGGACACT CT GGCTAAAGGTACAT CAGAT A 267 

Db 1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

Qy 268 ATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGG 327 

| | :::::::::: | : : | : : : 

Db 61 GGI PTNTTNLTLTINHI PDI S PAS FHRLDHLVEIDFRCNCVPI PLGSKNNMCI KRLQIKP 12 0 

Qy 328 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

: : I : : : : : : : 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

: : : : : I : : : : I : I : : I : : : 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 24 0 

Qy 448 CAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTT 5 07 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

| :::::: | :: :: : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGWFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 68 7 

Db 421 KRLKVIDLSVNKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 688 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

: : : : : :| :: : : :: : |:: | :: : 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 8 08 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 868 AAAAT G GAAT G AAC AC T G C T GAAG GAAT G C AG G GT T C 904 

: : : : : : I : : : : : I 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 

RESULT 4 3 
US-10-142-885-358 

; Sequence 358, Application US/10142885 

; Publication No. US20030157 604A1 

; GENERAL INFORMATION: 

; APPLICANT: Baker, Kevin P. 

; APPLICANT: Beresini, Maureen 



APPLICANT: DeForge, Laura 
APPLICANT: Desnoyers, Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT : Gao, Wei-Qiang 
APPLICANT : Gerritsen, Mary E. 
APPLICANT: Goddarci, Audrey 
APPLICANT: Godowski, Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe , Colin K 
APPLICANT : Wood, William 
APPLICANT : Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C248 

CURRENT APPLICATION NUMBER: US/ 10/ 142 , 8 85 
CURRENT FILING DATE: 2002-05-10 

Prior Apploication removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 1049 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-142-885-358 

Query Match 2.3%; Score 35.4; DB 15; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0; 

Qy 2 08 AC AC C GT GT GT T CT GC CT AT T GT C GAGAT AAGGAC ACT CT G GC TAAAGGT AC AT C AGAT A 267 

Db 1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

Qy 268 AT G G CAT C GT T G GC C AAAT T GGT GAACT GT TAT CT CAC GAG GAT T CC AG G GCT G G GT AG G 327 

| | :::::::::: | : : | :: : 

Db 61 GGI PTNTTNLTLTINHI PDI SPASFHRLDHLVEI DFRCNCVPI PLGSKNNMCIKRLQI KP 120 

Qy 32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 388 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 4 47 

: : : : : | : : : : | : | : : | : : : 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMI7VKI 240 

Qy 448 CAGAG GGAG C C AGAGGG C C T CAC AT CAAC AGAG GGT C T CT GAGCT CC C T G G AGCAAGGT T 507 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 



Qy 



568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 



Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 42 0 

Qy 628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

Db 421 KRLKVI DLSVNKI S P SGDS S EVGFCSNART SVES YEPQVLEQLH YFRYDKYARSCRFKNK 480 

Qy 68 8 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

| ::::::: I I : : : : : : : : | : : 

Db 481 EASFMSWESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 807 

: : : : : :| :: : : :: : |:: | :: : 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 808 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 8 68 AAAAT GGAAT GAACACT GCT GAAGGAAT GCAGGGTT C 904 

: : : : : : I : : : : : I 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 4 4 
US-10-158-790-358 

Sequence 358, Application US/10158790 
Publication No. US20030180879A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT: DeForge , Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT: Goddard, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe , Colin K 
APPLICANT: Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C448 

CURRENT APPLICATION NUMBER: US/10/158, 790 
CURRENT FILING DATE: 2002-05-30 

Prior Application removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 104 9 
TYPE: PRT 

ORGANISM: Homo Sapien 



US-10-158-790-358 

Query Match 2.3%; Score 35.4; DB 15; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0 

Qy 208 AC AC C GT GTGTTCTGC CT AT T GT C GAGAT AAGG AC AC T C T GGC T AAAG GT AC AT C AGAT A 2 67 

Db 1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

Qy 2 68 AT GG CAT C GT T GG C CAAAT T G GT GAAC T GT T AT C T C AC GAGGAT T C C AGGG CT GGGT AG G 327 

I | :::::::::: | : : | :: : 

Db 61 GGI PTNTTNLTLTINHI PDI S PAS FHRLDHLVEI DFRCNCVPI PLGSKNNMCI KRLQI KP 120 

Qy 32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 38 8 CTAGAAAATTCACTTGCATTT GCTT CCT GCTAGC CAT GGGT GAGCT GCCCTTTCTGAGT C 4 47 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 

Qy 44 8 CAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCT GAGCT CCCTGGAGCAAGGTT 507 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :::::: I : : : : : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

Db 421 KRLKVT DLSVNKI S PS GDS SEVGFCSNART SVES YEPQVLEQLHYFRYDKYARS CRFKNK 480 

Qy 68 8 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 807 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVXDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 808 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 867 

Db 601 LMMNDNDI S S STSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDI SKN 660 

Qy 8 68 AAAAT GGAAT GAAC ACT G CT GAAG GAAT G C AGGGT T C 904 

: : : : : : | : : : : : | 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 45 
US-10-137-871-358 



Sequence 358, Application US/10137871 
Publication No. US20030207350A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT: Beresini , Maureen 
APPLICANT : DeForge, Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT: Tumas, Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT: Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C153 

CURRENT APPLICATION NUMBER: US/10/137,871 
CURRENT FILING DATE: 2002-05-03 

Prior Application removed - See Palm or File Wrapper 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 104 9 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-137-871-358 



Query Match 2.3%; Score 35.4; DB 16; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



208 AC AC CGTGTGTTCTG C CT AT T GT C GAGATAAG GAC ACT CT G G C T AAAG GT AC AT C AGAT A 267 

1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

2 68 ATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGG 32 7 
I I ::::::::::]: : | : : : 

61 GGIPTNTTNLTLTINHIPDISPAS FHRLDHLVEIDFRCNCVPI PLGSKNNMCIKRLQIKP 120 

32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 44 7 

: : : : : | : : : : | : | : : | : : : 

181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 24 0 



Qy 

Db 



44 8 CAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTT 507 
241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 



Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

■ t • • • >..«■ 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :: : : : :|:: :: : : I : I : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 62 8 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

Db 421 KRLKVIDLSVNKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 688 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

| ::::::: I I : : : : : : : : I : : 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 74 8 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

: : : : : : | : : : : : : : | : : | : : : 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 8 08 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 868 AAAAT GGAAT GAAC AC T G C T GAAGGAAT GC AG GGT T C 9 04 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 4 6 
US-10-140-923-358 

Sequence 358, Application US/10140923 
Publication No. US2 00302 07 355A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT: DeForge , Laura 
APPLICANT : Desnoyers , Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen, Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godows ki , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe , Colin K 
APPLICANT : Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C188 

CURRENT APPLICATION NUMBER: US/ 10/ 14 0 , 923 
CURRENT FILING DATE: 2002-05-07 

Prior Application removed - See Palm or File Wrapper 



; NUMBER OF SEQ ID NOS : 550 
; SEQ ID NO 358 

LENGTH: 1049 

TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-140-923-358 

Query Match 2.3%; Score 35.4; DB 16; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 



Matches 30; Conservative 183; Mismatches 4 84; Indels 0; Gaps 0 

Qy 208 ACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACACTCTGGCTAAAGGTACATCAGATA 2 67 

Db 1 MVFPMWTLKRQI LI LFNI ILI SKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEI P 60 

Qy 268 ATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGG 32 7 

I I ::::::::: :| : : | :: : 

Db 61 GGIPTNTTNLTLTINHIPDISPAS FHRLDHLVEIDFRCNCVPIPLGSKNNMCIKRLQIKP 120 

Qy 32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 

Qy 448 CAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTT 507 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPWAFDALTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

: : : : : : : | : : : : : : : : : : : 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :: : : : :|:: :: : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

Db 421 KRLKVIDLSVNKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 68 8 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

I : : : : : : : | | : : : : : : : : | : : 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 74 8 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 8 08 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 8 68 AAAAT G GAAT GAAC AC T GCT GAAG GAAT G C AG GGT T C 904 



Db 



661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 47 
US-10-141-756-358 

Sequence 358, Application US/10141756 
Publication No. US200302 07359A1 
GENERAL INFORMATION: 
APPLICANT: Baker, Kevin P. 
APPLICANT : Beresini , Maureen 
APPLICANT : DeForge, Laura 
APPLICANT : Desnoyers, Luc 
APPLICANT : Filvarof f , Ellen 
APPLICANT: Gao, Wei-Qiang 
APPLICANT: Gerritsen,Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT: Gurney, Austin L. 
APPLICANT : Sherwood, Steven 
APPLICANT: Smith, Victoria 
APPLICANT: Stewart , Timothy A. 
APPLICANT : Tumas , Daniel 
APPLICANT: Watanabe, Colin K 
APPLICANT: Wood, William 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
TITLE OF INVENTION: ACIDS ENCODING THE SAME 
FILE REFERENCE: P3330R1C200 

CURRENT APPLICATION NUMBER: US/10/14 1 , 756 
CURRENT FILING DATE: 2002-05-08 

Prior Apploication removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 1049 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-141-756-358 



Query Match 2.3%; Score 35.4; DB 16; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0; 



QY 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



208 AC ACC GT GT GT T CT GC CT AT T GT C GAGATAAGGAC AC T CT G G CT AAAG GT AC AT C AGAT A 2 67 



I 



1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

268 ATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGG 327 
I | ::::::::::): : | : : : 

61 GGI PTNTTNLTLTINHI PDI S PASFHRLDHLVEI DFRCNCVPI PLGSKNNMCIKRLQI KP 120 

32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 



Qy 44 8 CAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTT 507 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 



Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :: : : : :|:: :: : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

: : : : : : : : : I I : : : : : I : : 

Db 421 KRLKVI DLSVNKI S PSGDS SEVGFCSNARTSVES YEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 68 8 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 808 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 8 68 AAAAT G GAAT GAAC AC T G CT GAAG GAAT GC AG G GT T C 9 04 

: : : : : : I : : : : : I 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 4 8 
US-10-141-759-358 

Sequence 358, Application US/10141759 
Publication No. US20030207361A1 
GENERAL INFORMATION : 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Baker, Kevin P. 
Be resini, Maureen 
DeForge, Laura 
Desnoyers , Luc 
Filvarof f , Ellen 
Gao, Wei-Qiang 
Gerritsen,Mary E. 
Godda rd , Audrey 
Godows ki , Paul J . 
Gurney, Austin L. 
Sherwood, Steven 
Smith, Victoria 
Stewart, Timothy I 
Tumas , Daniel 
Watanabe, Colin K 
Wood, William 
Zhang, Zemin 



TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 



; TITLE OF INVENTION: ACIDS ENCODING THE SAME 
; FILE REFERENCE: P3330R1C197 
; CURRENT APPLICATION NUMBER: US/ 10/ 14 1, 759 
; CURRENT FILING DATE: 2002-05-08 

; Prior Apploication removed - See File Wrapper or Palm 
; NUMBER OF SEQ ID NOS : 550 
; SEQ ID NO 358 

LENGTH: 1049 

TYPE: PRT 
; ORGANISM: Homo Sapien 
US-10-141-759-358 



Query Match 2.3%; Score 35.4; DB 16; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0; 

Qy 2 08 ACACCGTGTGTTCTGCCTATTGTCGAGATAAGGACACTCTGGCTAAAGGTACATCAGATA 2 67 

Db 1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

Qy 268 AT G G CAT C GT T G G C CAAAT T GGT GAACT GT TAT CT C AC GAGGAT T C C AG G GCT G GGT AG G 327 

I I :::::::::: | : : | : : : 

Db 61 GGIPTNTTNLTLTINHIPDISPASFHRLDHLVEIDFRCNCVPIPLGSKNNMCIKRLQIKP 120 

Qy 328 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 38 8 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

: : : : : | : : : : | : | : : | : : : 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 24 0 

Qy 448 CAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTT 5 07 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 5 68 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

| :::::: | : : : : : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 62 8 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

Db 421 KRLKVIDLSWKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 



Qy 68 8 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 



Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 



QY 



8 08 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 



Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 868 AAAAT G GAAT GAAC ACT GCT GAAG GAAT G C AGG GT T C 904 

: : : : : : I : : : : : I 

Db 661 SLS FLPS GVFDGMP PNLKNLS LAKNGLKS FSWKKLQC 697 



RESULT 49 
US-10-140-805-358 

Sequence 358, Application US/10140805 
Publication No. US200302074 17A1 
GENERAL INFORMATION : 
APPLICANT: Baker, Kevin P. 

Beresini, Maureen 
DeForge, Laura 
Desnoyers , Luc 
Filvarof f , Ellen 
Gao, Wei-Qiang 
Gerritsen,Mary E. 
Goddard, Audrey 
Godowski , Paul J. 
Gurney, Austin L. 
Sherwood, Steven 
Smith, Victoria 
Stewart, Timothy A. 
Tumas, Daniel 
Watanabe, Colin K 
Wood, William 



APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
TITLE OF INVENTION 
TITLE OF INVENTION 



Zhang, Zemin 

SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
ACIDS ENCODING THE SAME 



FILE REFERENCE: P3330R1C176 
CURRENT APPLICATION NUMBER: US/10/140,805 
CURRENT FILING DATE: 2002-05-07 

Prior Apploication removed - See File Wrapper or Palm 
NUMBER OF SEQ ID NOS : 550 
SEQ ID NO 358 
LENGTH: 1049 
TYPE: PRT 

ORGANISM: Homo Sapien 
US-10-140-805-358 



Query Match 2.3%; Score 35.4; DB 16; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 



Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0; 

Qy 2 08 ACAC CGT GT GT TCT GCCT ATT GT CGAGATAAGGACACT CT GGCTAAAGGTACAT CAGATA 267 

Db 1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

Qy 268 ATGGCATCGTTGGCCAAATTGGTGAACTGTTATCTCACGAGGATTCCAGGGCTGGGTAGG 327 

I I ::::::::: :| : ' : | :: : 

Db 61 GGI PTNTTNLTLTINHI PDI SPAS FHRLDHLVEIDFRCNCVPI PLGSKNNMCIKRLQIKP 120 

Qy 32 8 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 38 7 



Db 



121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 



Qy 388 CTAGAAAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

: : : : : | : : : : | : | : : | : : : 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 

Qy 44 8 CAGAGG G AG C C AGAGG GC CT C ACAT C AACAGAG G GT C T CT GAGC T C C C T GGAG C AAG GT T 507 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFL7VKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 568 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 627 

I :::::: I : : : : : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 628 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

::::::::: I | : : : :: | : : 

Db 421 KRLKVIDLSVNKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 

Qy 688 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

Db 481 EASFMSVNESCYKYGQTLDLSKNS1FFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 

Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 8 07 

: : : : : :| :: : : :: : |:: | :: : 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 8 08 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 867 

: : : | : : : : : :::::]:: : : : 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 868 AAAAT GGAAT GAACACT GCT GAAGGAAT GCAGGGTT C 904 

: : : : : : I : :: :: | 

Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



RESULT 50 
US-10-140-864-358 

Sequence 358, Application US/10140864 
Publication No. US20030207419A1 
GENERAL INFORMATION : 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Baker, Kevin P . 
Beresini, Maureen 
DeForge, Laura 
Desnoyers , Luc 
Filvarof f , Ellen 
Gao, Wei-Qiang 
Gerritsen, Mary E. 
Goddard, Audrey 
Godows ki , Paul J . 
Gurney, Austin L. 
Sherwood, Steven 
Smith, Victoria 
Stewart , Timothy A. 



APPLICANT: Tumas , Daniel 
; APPLICANT: Watanabe, Colin K 
; APPLICANT: Wood, William 

APPLICANT: Zhang, Zemin 
; TITLE OF INVENTION: SECRETED AND TRANSMEMBRANE POLYPEPTIDES AND NUCLEIC 
; TITLE OF INVENTION: ACIDS ENCODING THE SAME 
; FILE REFERENCE: P3330R1C184 
; CURRENT APPLICATION NUMBER: US/ 10/ 14 0 , 8 64 
; CURRENT FILING DATE: 2002-05-07 

; Prior Application removed - See Palm or File Wrapper 

; NUMBER OF SEQ ID NOS : 550 

; SEQ ID NO 358 

; LENGTH: 1049 

; TYPE: PRT 

; ORGANISM: Homo Sapien 

US-10-140-864-358 



Query Match 2.3%; Score 35.4; DB 16; Length 1049; 

Best Local Similarity 4.3%; Pred. No. 1.7; 

Matches 30; Conservative 183; Mismatches 484; Indels 0; Gaps 0 

Qy 2 08 AC AC C GT GT GT T CT G C C TAT T GT C GAGAT AAG GAC AC T C T G GCT AAAGGT AC AT C AGAT A 267 

Db 1 MVFPMWTLKRQILILFNIILISKLLGARWFPKTLPCDVTLDVPKNHVIVDCTDKHLTEIP 60 

Qy 268 AT GGC AT C GT T G G C CAAAT T G GT GAAC T GT TAT CT CAC GAGGAT T C C AG G GC T G GGT AG G 327 

I I ::::::::::!: : | : : : 

Db 61 GGIPTNTTNLTLTINHIPDISPAS FHRLDHLVEIDFRCNCVPIPLGSKNNMCIKRLQIKP 120 

Qy 328 ATCGGACAGGGCACTCCCATTGGCTCCTCAGTTAAAGCTGCCCTGGAGCCGGACAGGCCA 387 

Db 121 RSFSGLTYLKSLYLDGNQLLEIPQGLPPSLQLLSLEANNIFSIRKENLTELANIEILYLG 180 

Qy 388 CTAG7\AAATTCACTTGCATTTGCTTCCTGCTAGCCATGGGTGAGCTGCCCTTTCTGAGTC 447 

: : : : : | : : : : | : | : : | : : : 

Db 181 QNCYYRNPCYVSYSIEKDAFLNLTKLKVLSLKDNNVTAVPTVLPSTLTELYLYNNMIAKI 240 

Qy 448 CAGAGGGAGCCAGAGGGCCTCACATCAACAGAGGGTCTCTGAGCTCCCTGGAGCAAGGTT 5 07 

Db 241 QEDDFNNLNQLQILDLSGNCPRCYNAPFPCAPCKNNSPLQIPVNAFDALTELKVLRLHSN 300 

Qy 508 CGGTCACGGGCACAGAGGCTCGGCACAGCTTAGGTGTCCTGCATGTGTCCTACAGCGTCA 567 

Db 301 SLQHVPPRWFKNINKLQELDLSQNFLAKEIGDAKFLHFLPSLIQLDLSFNFELQVYRASM 360 

Qy 5 68 GGTAAGGGGACCTCCACAGCAAAAAGCTAGGCTCTCTGATTGCCTTTTCTGAATGGGTGG 62 7 

I :::::: I : : : : : : | : | : : 

Db 361 NLSQAFSSLKSLKILRIRGYVFKELKSFNLSPLHNLQNLEVLDLGTNFIKIANLSMFKQF 420 

Qy 62 8 GTGGGCCTGTGGGCTTTGGGTTGTCTGTCCAGCAGATCAGGGTGAAAGTGGACAGTCTGT 687 

Db 421 KRLKVIDLSVNKISPSGDSSEVGFCSNARTSVESYEPQVLEQLHYFRYDKYARSCRFKNK 480 



Qy 68 8 AACAACAGTGAGTCGTTCCTCCTCCTCCTCCTGCGCAGGGCAGAGCCTGGACATTAAAAC 747 

Db 481 EASFMSVNESCYKYGQTLDLSKNSIFFVKSSDFQHLSFLKCLNLSGNLISQTLNGSEFQP 540 



Qy 748 ATGCCCTGCCTGAAGCCGCTTGCTGCTTCTCACTGATTTCTGCTCTCCCCTTCCTTGACT 807 

: : : : : : I : : : : : : : | : : | : : : 

Db 541 LAELRYLDFSNNRLDLLHSTAFEELHKLEVLDISSNSHYFQSEGITHMLNFTKNLKVLQK 600 

Qy 808 CGCCCACCACCTGTCCTGTGTAGATGGAGAAGGCTCGGAGAGTGGGGGTGCTGGGGGCAC 8 67 

Db 601 LMMNDNDISSSTSRTMESESLRTLEFRGNHLDVLWREGDNRYLQLFKNLLKLEELDISKN 660 

Qy 868 AAAAT GGAAT GAACACT GCT GAAGGAAT GCAGGGTT C 904 

: : : : : : | : :: :: | 
Db 661 SLSFLPSGVFDGMPPNLKNLSLAKNGLKSFSWKKLQC 697 



Search completed: April 29, 2004, 21:08:49 
Job time : 1532.81 sees 



